Haq, Omar. Modeling correlated mutations in computational biology using log-linear analysis and graph-theoretic probabilistic inference methods. Retrieved from https://doi.org/doi:10.7282/T38G8JHS
DescriptionPoint mutations are random events but selection for protein stability and function fixes specific combinations of amino acid mutations in the protein population. Many mutations are not independent but are found to be strongly correlated, the signal for which is present in multiple sequence alignment data. Using HIV Protease as a model system, this work bridges the gap between the analysis of protein sequences using statistical techniques developed by the physics and computer science communities, and the biophysical modeling of protein energetics. Using information theoretic methods together with a coarse-grained (Generalized Born) energy model we have analyzed the contribution of electrostatic interactions to protein stability among mutated residues of HIV-1 protease based on models derived from a large database of sequences which have acquired drug resistance. In the course of this work we have constructed a mean field model at the level of pair correlations (Bethe approximation) to predict the probabilities of observing mutated sequences using the HIV sequence database to parameterize the model.