Methods for leveraging auxiliary signals for low-resource NLP

Dong, Xin

doi:doi:10.7282/t3-pkg2-7241

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Methods for leveraging auxiliary signals for low-resource NLP

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(1.08 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Dong, Xin. Methods for leveraging auxiliary signals for low-resource NLP. Retrieved from https://doi.org/doi:10.7282/t3-pkg2-7241

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleMethods for leveraging auxiliary signals for low-resource NLP

NameDong, Xin (author); de Melo, Gerard (chair); Zhang, Yongfeng (member); Stratos, Karl (member); Zhao, Handong (member); Rutgers University; School of Graduate Studies

Date Created2023

Other Date2023-01 (degree)

SubjectComputer science, Natural language processing (Computer science)

Extent116 pages : illustrations

DescriptionThere is a growing need for NLP systems that support low-resource settings, for which task-specific training data may be lacking, while domain-specific corpora is too scarce to build a reliable system. In the past decade, the co-occurrence-based training objectives of methods such as word2vec are first able to offer word-level semantic information for specific domains. Recently, pretrained language model architectures such as BERT have been shown capable of learning monolingual or multilingual representations with self-supervised objectives under a shared vocabulary, simply by combining the input from single or multiple languages. Such representations greatly facilitate low-resource language applications. Still, the success of such cross-domain transfer hinges on how close the involved domains are, with substantial drops observed for some more distant domain pairs, such as English to Korean, Wikipedia text to social media comments. To address this, domain-specific unlabeled corpora is available to serve as the auxiliary signals to enhance low-resource NLP systems. In this dissertation, we present a series of methods for leveraging auxiliary signals. In particular, cross-lingual sentiment embeddings with transfer learning are proposed to improve sentiment analysis. For cross-lingual text classification, we present a self-learning framework to take advantage of unlabeled data. Furthermore, a framework upon data augmentation with adversarial training for cross-lingual NLI is proposed for the low-resource problem from the target domain. Finally, we present two effective methods on injecting extra information with auxiliary signals from multiple sources for temporal event reasoning and rating estimation in recommendation system. Extensive experimental results demonstrate the effectiveness of the proposed methods in achieving better performance across a variety of NLP tasks.

NotePh.D.

NoteIncludes bibliographical references

Genretheses

Persistent URLhttps://doi.org/doi:10.7282/t3-pkg2-7241

LanguageEnglish

CollectionSchool of Graduate Studies Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide