Disentangled generative models and their applications

Liu, Bingchen

doi:doi:10.7282/t3-nc7w-fq64

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Disentangled generative models and their applications

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(62.67 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Liu, Bingchen. Disentangled generative models and their applications. Retrieved from https://doi.org/doi:10.7282/t3-nc7w-fq64

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleDisentangled generative models and their applications

NameLiu, Bingchen (author); Elgammal, Ahmed (chair); Vladimir, Pavlovic (member); Zhang, Yongfeng (member); Shen, Xiaohui (member); Rutgers University; School of Graduate Studies

Date Created2022

Other Date2022-05 (degree)

SubjectComputer science, Computer vision, Deep learning, Generative model, Machine learning

Extent196 pages : illustrations

DescriptionGenerative models, such as Auto-Encoders, Generative Adversarial Networks, Generative Flows, and Diffusion Models, are fascinating for their ability to synthesis versatile visual and audio information from merely noise. This generative process usually requires a model to perceive and compress high-dimensional data into a compact, low-dimensional latent space, where each dimension encodes some valuable semantic variations in the original data space. How much we know about the latent space is vital because it determines how we can take advantage of the corresponding generative model. Imagine a GAN trained on human faces. If we know which dimension in its latent vector controls the concept ``hair shape", we can synthesize multiple images for the same face to try different hairstyles without changing other facial attributes. Disentangling generative models makes them more fun to play with, which is the topic of this thesis. This thesis studies the unsupervised disentangling of the latent space in GANs focused on the image domain and extended to multi-modalities (image captioning and text-to-image synthesis). The proposed methods in this thesis enable the GAN model to disentangle its latent space automatically, thus sparing the expensive effort of collecting semantic labeling for the training data. Derived from disentanglement, this thesis also covers studies on model interpretability and human-controllable data synthesis. This thesis contains three main topics:

First, we work on general-purpose disentanglement. A novel GAN-based disentanglement framework with One-Hot Sampling and Orthogonal Regularization (OOGAN) is proposed. While previous works primarily attempt to tackle disentanglement learning through VAE with various approximation-based methods, we show that GANs have a natural advantage in disentangling with an alternating latent variable (noise) sampling method that is straightforward and robust.

Second, we work on a more specific task: disentangling coarse and fine level style attributes for GAN. The proposed PIVQGAN facilities independent control and manipulation of coarse-level object arrangements (posture) and fine-grained level styling (identity) for both synthesized images from noise or sampled images from real datasets. We design a Vector-Quantized module for better pose-identity disentanglement and a novel joint-training scheme merging GAN and Auto-Encoder, which facilities several self-supervision tasks for the model to better separate the attributes.

Lastly, we study two applications taking advantage of a better disentangled GAN with mutual information learning. Focusing on text-to-image generation, we propose Text-and-Image Mutual-Translation Adversarial Networks (TIME), a lightweight but effective model that jointly learns a T2I generator and an image captioning discriminator under one single GAN framework. By maximizing the mutual information between the latent models of image and text. Focusing on sketch-to-image generation, we study exemplar-based sketch-to-image synthesis tasks in a self-supervised learning manner, eliminating the necessity of the paired sketch data via a better disentanglement between content information from sketch and style information from an exemplar RGB image.

NotePh.D.

NoteIncludes bibliographical references

Genretheses

Persistent URLhttps://doi.org/doi:10.7282/t3-nc7w-fq64

LanguageEnglish

CollectionSchool of Graduate Studies Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide