DescriptionThere is a growing need for NLP systems that support low-resource settings, for which task-specific training data may be lacking, while domain-specific corpora is too scarce to build a reliable system. In the past decade, the co-occurrence-based training objectives of methods such as word2vec are first able to offer word-level semantic information for specific domains. Recently, pretrained language model architectures such as BERT have been shown capable of learning monolingual or multilingual representations with self-supervised objectives under a shared vocabulary, simply by combining the input from single or multiple languages. Such representations greatly facilitate low-resource language applications. Still, the success of such cross-domain transfer hinges on how close the involved domains are, with substantial drops observed for some more distant domain pairs, such as English to Korean, Wikipedia text to social media comments. To address this, domain-specific unlabeled corpora is available to serve as the auxiliary signals to enhance low-resource NLP systems. In this dissertation, we present a series of methods for leveraging auxiliary signals. In particular, cross-lingual sentiment embeddings with transfer learning are proposed to improve sentiment analysis. For cross-lingual text classification, we present a self-learning framework to take advantage of unlabeled data. Furthermore, a framework upon data augmentation with adversarial training for cross-lingual NLI is proposed for the low-resource problem from the target domain. Finally, we present two effective methods on injecting extra information with auxiliary signals from multiple sources for temporal event reasoning and rating estimation in recommendation system. Extensive experimental results demonstrate the effectiveness of the proposed methods in achieving better performance across a variety of NLP tasks.