DescriptionIn the context of text understanding, computational methods are used to study how humans utilize stylistic elements (visual and rhetorical) in combination with language to express emotions and opinions. In this dissertation, I present several computational experiments and annotated datasets for understanding the communicative meanings of these presentations. This involves studying computational models for predicting emotions, rhetorical relations, and lexical aspects.
First, I explore to what extent distributional semantic models and other proposed resources can capture the intensity of emotions in lexical items. Using human judgment as gold standard, I employ language models for word-level emotion intensity prediction and present a technique to attain an emotional database. This new emotion database provides graded emotion intensity scores for English language words with regard to a fine-grained inventory of over 200 different emotion categories. In the next step, in a series of unsupervised, supervised, and self-supervised experiments, I estimate the word-level emotion intensity scores for specific emotions. The results indicate that: 1) the sentiment score of words can improve distributional models for emotion classification, and 2) language models perform better in practice in comparison to emotion lexicons.
Next, I present a statistical analysis of the role of emojis in text as a visual element that conveys emotion. I quantify the strong connection between the emoji and emotions by collecting and analyzing large corpus tweets. I partially annotate the dataset with the syntactic role of emojis in the text. The empirical results illustrate that emojis are used mainly to intensify the emotion in a sentence; however, in some cases, they replace a word or phrase in the sentence or signal contrast in a sarcastic context.
The last part of the dissertation focuses on the usage of the Persian language in different contexts and for different purposes. The majority of works in NLU challenges are concentrated on resource-rich languages like English. As an effort to deliver high-quality evaluation resources for Persian, I present two corpora to analyze stylistic and rhetorical elements in standard and poetic Persian text. The first corpus covers Persian literary text mainly focusing on poetry, annotated for century and style, with additional partial annotation of rhetorical figures. Next is a benchmark for textual entailment in Persian. This is part of a larger benchmark that covers a range of language understanding tasks in Persian (eg. reading comprehension, textual entailment, sentiment analysis, etc). This benchmark is the first work designed for studying various NLP tasks in Persian.