Lohia, Ruchi. Predicting the effect of genetic variance on the sequence-ensemble relationship of intrinsically disordered proteins. Retrieved from https://doi.org/doi:10.7282/t3-00rt-0w08
DescriptionA hierarchical sequence-based framework for analysis and conceptualization of intrinsically disordered proteins (IDPs) is presented. This framework was further used to develop a novel test for enrichment of higher-order (tertiary) structure in a disordered protein using Molecular Dynamics (MD) simulations and Monte Carlo simulations. Finally, we show that the developed framework can also serve as a useful tool in predicting the consequence of an amino acid substitution on the IDPs function using a bioinformatics approach.
In structured proteins, contacts between residues distant along the sequence are reflected in the tertiary structure, but developing a framework for describing the analogous property in IDPs has not been straightforward. The distribution of hydrophobic residues within the sequence was used to identify 4-15 residues `blobs' representing local globular regions or linkers. We use this framework within a novel test for enrichment of higher-order (tertiary) structure in disordered proteins; the size and shape of each blob is extracted from MD simulation of the real protein (RP) and used to parameterize a self-avoiding heterogeneous polymer (SAHP). In our study on the 91-residue disordered prodomain of brain derived neurotrophic factor (BDNF), we find that the long 15 residue linker itself creates a segmentation in contact pair map for both SAHP and RP. We find that in RP only the contact between the segmented region is enriched relative to SAHP. We further quantified the enrichment observed for several other hydrophobic substitutions within the disordered prodomain, including the disease-causing Val66Met substitution. We find that in RPs the enrichment observed in the contact between the segmented region is sensitive to amino acid substitution as well. Only the disease-associated Met66 substitution enriches these contacts significantly, due to a preferred Met-Met interaction. Furthermore, we find several properties of the blobs identified with the sequence-based framework which are enriched in disease-associated SNPs relative to non disease-associated SNPs. This allowed us to present the first systematic, bottom-up, attempt to both identify and annotate subdomains within disordered proteins that are enriched for functional effects.