Description
TitleImproving inference and generation process of generative adversarial networks
Date Created2021
Other Date2021-05 (degree)
Extent1 online resource (xiv, 91 pages)
DescriptionGenerative Adversarial Networks (GANs) have been witnessed tremendous successes in broad Computer Vision applications, such as image synthesis, photo-editing, and image-to-image translation, etc. Closely related to GAN-based image synthesis, there are two promising directions. (i) GAN-based inference learning and (ii) GAN-based video synthesis. Both directions face many challenges in generation and inference: In GAN-based Inference Learning, the application-driven methods usually outperform the approaches with elegant theories. For GAN-based video synthesis, the generation quality is far behind the contemporary image generators. To tackle those challenges, we conduct extensive studies on improving inference and generation processes.
First, we investigate a specific application: Generating multi-view images from a single view input. We identify the “incomplete” representation issue in the existing single-pathway framework, then propose a two-pathway approach to address this problem. In addition to the single reconstruction path, we introduce a generation sideway to maintain the completeness of the learned embedding space. Self-supervised learning is also employed to make the use of both labeled and unlabeled data. The experimental results prove that the proposed method significantly outperforms state-of-the-art methods, especially when generating from “unseen” inputs in wild conditions.
Second, as the theoretical extension of the previous work. We analyze three issues that degrade both generation and inference in GAN-based inference learning approaches: “holes” issue and “non-reverse encoder-generator” issue in single-pathway adversarial learning, and “unstable inference mapping” issue in bidirectional adversarial learning. To address all three issues in a unified framework, we take the single-pathway approach as a baseline to avoid the “unstable inference mapping” issue and propose two strategies to solve the remaining others. Theoretical analysis proves that the learned encoder and decoder are mutually inverse. Results on both synthetic data and real-world applications support our theoretical analysis and demonstrate improved performance against baselines.
Third, we present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator. We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled. Furthermore, we introduce a new task, which we call cross-domain video synthesis, in which the image and motion generators are trained on disjoint datasets belonging to different domains. This allows for generating moving objects for which the desired video data is not available. Extensive experiments on various datasets demonstrate the advantages of our methods over existing video generation techniques.
NotePh.D.
NoteIncludes bibliographical references
Genretheses, ETD doctoral
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.