DescriptionDeep neural networks (DNNs) have been widely used in many important applications, such as computer vision, speaker recognition, natural language processing, etc. Despite their current popular adoptions, DNN models are facing security and efficiency issues when considering their practical use in many critical and resource limited systems. In particular, the vulnerability of DNN models under the adversarial attack, an emerging attack approach that only performs imperceptible perturbation on the input, has became a significant potential challenge that hinders the further deployment of deep learning in real-world applications. As for the model efficiency, especially for the Transformer model which has higher capacity among other architectures, the quadratic time and computational complexity impede the model scalability and break the evolution progress of Transformer models. To address these two challenges, this thesis first focuses on exploring the vulnerability and improving the robustness of deep learning against adversarial attacks in both audio and image domains. To be specific, we analyze and investigate the fast, practical and universal attacks against different types of audio systems (i.e. speaker recognition model, speech command recognition model, etc.). And we also study attacks in image domain with multiple constraints (memory and timing) as well as the defense mechanism. In addition, we further tackle the efficiency issues by reducing compute cost of the Transformer models with negligible performance loss thereby making the models reliable in many real-time applications. The research outcomes and the delivered approaches will facilitate the deep understanding and development of DNN models when considering secure and efficient deployments in the practical systems.