TY - JOUR TI - Understanding the functional impact of genomic variants DO - https://doi.org/doi:10.7282/t3-ka1c-jv19 PY - 2020 AB - Structural variations (SV) can lead to DNA rearrangements and frequently cause diseases such as neurological disorders. SVs account for more total nucleotide changes and occur more frequently than single nucleotide polymorphisms (SNPs) (Stankiewicz and Lupski, 2010). As we continue to gain knowledge, SV has surpassed SNPs in its effects on human evolution, population diversity, and genetic diseases (Stankiewicz and Lupski, 2010). Compared to SNP, SV is more challenging to study due to its complex configuration, large size, and repetitive arrangement. Meanwhile, sequencing technologies including Illumina and Oxford Nanopore sequencing platform are being actively developed to generate sequencing data of human whole genomes, which can then be analyzed to study genetic variations. This series of studies aims to employ contemporary sequencing technologies and computational workflows to unravel the functional impact of SVs. Good tools are prerequisite to the successful execution of a job. My study starts from developing a pipeline construction tool called PipelineDog that can be used throughout the work. PipelineDog is a web-based integrated development environment (IDE) that represents a novel way to arrange and define workflows while promoting code scalability and reusability. I then apply established tools and workflows to analyze a 192-invidual cohort, surveying the large structural genetic etiology of autism spectrum disorder (ASD) and attention-deficit/hyperactivity disorder (ADHD) co-occurrence. Lastly, the newly commercialized Nanopore sequencing technique was tested and evaluated on both existing and simulated data. The Nanopore sequencing is anticipated to improve the SV identification, as it generates longer reads and will enrich the SV determining evidence. I improved the overall SV identification accuracy by employing a random forest machine learning model to classify the combined dataset from different workflows. This analysis shed light on how to determine which SV identification workflow to use based on specific use cases for future projects. KW - Pipeline KW - Quantitative Biomedicine LA - English ER -