Applications of the mixed linear model in genome-wide association studies and small RNA motif discovery
Citation & Export
Hide
Simple citation
Diao, Liyang.
Applications of the mixed linear model in genome-wide association studies and small RNA motif discovery. Retrieved from
https://doi.org/doi:10.7282/T33X8547
Export
Description
TitleApplications of the mixed linear model in genome-wide association studies and small RNA motif discovery
Date Created2014
Other Date2014-10 (degree)
Extent1 online resource (x, 126 p. : ill.)
DescriptionIf sheer number of papers published is indicative of anything, it suggests that the age of genome-wide association studies, or GWAS, is here to stay. However, in spite of the influx of data, several issues remain, one of which is the presence of confounding factors caused by relatedness within the study sample. This can cause many false positive results. In recent years, the use of mixed linear models to correct for unknown types of relatedness, i.e. "cryptic relatedness", has been very popular. While this model has been shown to be successful in some cases, here we address the feasibility of performing GWAS in a highly structured population such as Saccharomyces cerevisiae, and find that the inclusion of fixed local ancestry covariates can sometimes lend a study more power. Furthermore, we explore the application of mixed linear models in a different type of biological problem of discovering motifs associated with active microRNAs. While there exist several algorithms for miRNA motif discovery, only a few consider background sequence composition of the 3' UTR binding site in addition to seed sequence motif enrichment, which is known to factor into miNRA binding efficacy. The methods that do account for 3' UTR sequence composition do so by rescoring motif counts based on the background UTR sequence in which it appears. Though computationally efficient, these methods are unable to simultaneously compare both gene expression values and UTR sequence, which our method, named MixMir, is able to do, with favorable results. When compared to the simple linear model, as well as existing motif discovery algorithms, MixMir is able to rank true motifs more highly in multiple data sets. Such computational methods are biologically significant because although it is possible to sequence small RNAs in a sample, their expression may not be perfectly correlated with the size of their effect, which is what we observed.
NotePh.D.
NoteIncludes bibliographical references
Noteby Liyang Diao
Genretheses, ETD doctoral
Languageeng
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.