Description
TitleNeural methods for document understanding
Date Created2021
Other Date2021-10 (degree)
Extent1 online resource (vii, 97 pages) : illustrations
DescriptionDocument recognition involves both structured and unstructured data. This dissertationstudies and proposes solutions to problems in both domains. Structured data often comes in tabular format. Many existing applications use structured tables as input such as spreadsheets, HTML, Excel, and CSV. However, there are a plethora of circumstances that can lead to generating images of tables (scanning invoices, sharing user manuals, downloading read-only publications, etc.). In such scenarios, we do not have access to the table structure. To be able to use the aforementioned applications in the given scenarios, we focus on the problem of structure recognition by graph modeling. We are investigating ways of inferring the hidden structures from tables’ images. By extracting table fields in the form of an undirected graph G and finding the corresponding Line graph L(G), AKA edge-to-vertex dual graph, we are able to model the structure-inference problem as a vertex classification. We approach the graph vertex classification problem using the Graph Convolutional Network (GCN) based model. GCNs process graphs using graph convolution process analogous to how Convolution Neural Networks (AKA CNNs) process images using convolution. In the context of graphs, the convolution process is re-adjusted to treat a vertex and its neighboring vertices analogous to how a pixel and its neighboring pixels would be processed in a filtering/convolution process. For unstructured text data, category recognition has many applications (e.g. taxes, government forms, insurance, bills and invoices). We focus on the problem of text classification in both English and Multilingual settings.
For text data, unsupervised neural transfer learning from language modeling enables better text representation. A representation that reflects the semantic meanings of the represented words and semantic relations between them (generative pre-training). We investigate generative pre-training based models to approach text classification.
We focus on text classification in three settings: English, multilingual and Arabic. We use generative pre-training models to do the text classification tasks. We evaluated State of the Art generative pre-training models on English data sets. Then, we fix the model and apply it as is on Korean, Arabic and Spanish. Finally, we fix the language (Arabic), use three different language-enabled models (not as is). We use the following generative pre-training models: ULMFiT, BERT and XLM. ULMFiT depends on LSTM while Bert and XLM adopt Transformer architecture instead demonstrating that convolution can be abandoned and still generate State of the Art results. XLM adds a Translation based objective function, which considers knowledge learned from data sets belonging to different languages. This makes it a good candidate in our setting.
NotePh.D.
NoteIncludes bibliographical references
Genretheses
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.