TY - JOUR TI - Large-scale protease multispecificity DO - https://doi.org/doi:10.7282/T3Z322VM PY - 2018 AB - Proteases are ubiquitous and significant to both normal cellular functioning and disease states. They are generally multispecific, cleaving a set of substrates without recognizing other peptides. Computational methods to predict and design protease multispecificity would advance our understanding of the biophysical basis of protease specificity, enable the characterization of novel proteases, allow the identification of novel biological roles for proteases, elucidate protease specificity landscapes and ultimately further the design of custom proteases to serve as therapeutics or protein-level knockout reagents in cell culture. Current methods of computational protease specificity prediction are limited in a variety of ways. Techniques to classify substrates as cleaved or uncleaved are constrained by the quality of the input data, cannot be easily generalized to other proteases, and require large training data sets to learn correlations between substrate positions. Methods that predict specificity profiles are computationally expensive and thus unable to be used directly within design. While fitness landscapes have been explored experimentally and via low-resolution computational models, no methods have yet explored the full fitness landscape using chemically realistic atomic-resolution computations. In this dissertation, we further the understanding of protease multispecificity via a variety of experimental and computational techniques that can be generalized to other proteases. First, we develop a structure-based classifier that distinguishes robustly between cleaved and uncleaved substrates, benchmark the classifier performance for five model proteases, and apply the classifier in a blind test to identify novel substrates. Second, we implement a mean-field structure-based algorithm (MFPred) to rapidly and accurately predict protease specificity profiles, benchmark MFPred performance on a range of protease and protein-recognition domains, and demonstrate that MFPred accurately predicts the impact of receptor-side mutations, thus showing putative utility in protease design. Third, we construct a specificity landscape of hepatitis C virus NS3 protease using both experimental and computational methods and find evidence for a structural basis of mutational robustness. Finally, we compare the Rosetta and Amber energy functions used in the computational prediction of protease multispecificity in a systematic benchmark. KW - Quantitative Biomedicine KW - Protein engineering LA - eng ER -