TY - JOUR TI - Tools for genetic data management and strategies for optimized imputation of missing genotypes DO - https://doi.org/doi:10.7282/T3FB54M6 PY - 2014 AB - This dissertation includes two main areas of research. The first focuses on the design and development of a genetic study data management and analysis system that aims to ease the burden of dealing with the very large amounts of genetic linkage and association study data from high throughput genotyping platforms and to facilitate the integration of data from multiple sources. The Genetic Study Database (GSD) system is designed to provide security in data transmission and user management, flexibility in study data management and simplicity in user interface operations. The second area of research focuses on the imputation of inherited genetic polymorphisms or rare variants. Since 2001, with the advent of high throughput sequencing technologies, the cost of sequencing an entire human genome has dropped from 100 million dollars to less than five thousand dollars per genome. Nevertheless, it is still too costly to obtain whole genome sequencing data for every individual in a research study involving thousands of subjects. Genotype imputation, also called in-silico genotyping, is a cost-effective and efficient way to maximize genome coverage in an association study for little or no additional cost. Depending on the type of genetic study, there are two approaches for doing genotype imputation: population-based and family-based. Both are covered in the research reported here. The population-based approach takes advantage of publicly available genotype reference panels in predicting genotypes of unobserved variants among unrelated individuals. Here, the focus will be on optimizing the post-imputation filtering strategy to find the appropriate balance in the tradeoff between accuracy and the yield of the imputation process (i.e., maximize the number of genotypes imputed). The family-based approach leverages the rich information available in a pedigree to increase power for imputing genotypes of unobserved variants among biological relatives. When performing family-based imputation, it is important to decide how many family members and which family members to select for high density variant genotyping. Their data will be used to predict genotypes of other family members. Therefore, one aim of this part of the research will be to evaluate different family-based imputation designs to identify cost-effective strategies. This dissertation includes three chapters: 1) designing and building a sophisticated web-based genetic study data management system, 2) identifying an optimized set of genotype/SNP filters for population-based imputation, and 3) discovering the most efficient family-based imputation strategies for various pedigree structures. KW - Biomedical Informatics KW - Genetics--Research KW - Genetics--Data processing LA - eng ER -