Flynn, William F.. Maximum entropy Potts Hamiltonian models of protein fitness and applications to HIV-1 proteins. Retrieved from https://doi.org/doi:10.7282/T3154M5B
DescriptionProtein evolution is governed by a balance of mutation and selection that act on an ensemble of protein sequence variants. The acquisition of a mutation and the effect it has on a variant's prevalence in this ensemble depends on networks of interactions with the rest of the protein sequence, a phenomenon known as epistasis. Evidence of this epistatic interaction network can be extracted from protein sequence alignments in the form of pairwise correlations. In this dissertation, we focus on building models derived from statistical physics of correlated mutation patterns and from them, extracting information that describes the protein fitness landscape and the interdependency of specific mutation patterns with drug-resistance. Using HIV-1 protease as our model system, we extract the pair correlations present in conventional multiple protein sequence alignments to build maximum entropy Potts Hamiltonian models of the drug-experienced HIV-1 protease mutational landscape. We demonstrate that these models are able to predict higher order sequence statistics and the fitness effects of multiple simultaneous mutations. Using the Hamiltonian to score individual protease sequences, we are able to identify the mutation patterns that responsible for promoting protease inhibitor resistance. As Hamiltonian models of sequence covariation is a growing field, we provide a first-order analytic framework to quantify the error in the model's predictions and provide a retrospective error analysis on the models published in the literature. Lastly, we provide an overview of related research efforts to study drug resistance within HIV-1 protease and its substrate, and to computationally screen putative drug candidates targeting HIV-1 integrase.