DescriptionPost Translational Modification (PTM) is an important regulatory mechanism and plays an important role in normal and pathological cells. A wide range of PTM types including phosphorylation, prenylation, ubiquitination, methylation, crotonylation, and acetylation have been identified so far. Among these PTMs, acetylation is one of the most important ones as it is associated with specific human diseases like hypertension, vascular diseases, arrhythmia, heart failure, and angiogenesis. Experimental methods to detect acetylation in proteins include radioactive chemical method, chromatin immunoprecipitation, and mass spectrometry. However, all these methods are time-consuming and costly. Therefore, finding fast and effective computational approaches to effectively detect acetylation sites is attracting tremendous attention.In this study, we propose a new machine learning approach to enhance the lysine acetylation prediction performance in the following steps. First, we used two different traditional machine learning classifiers (RF and SVM) and use both structural and evolutionary features to investigate the effectiveness of the traditional machine learning classifiers on the prediction of lysine acetylation sites. Our results for this step demonstrate that SVM has better performance than RF in almost all aspects in both human and mouse samples. We then investigate related attributes that are important in the prediction task such as the training and testing ratio and the impact of our employed features. As a result, we identify that the 5:1 training and testing ratio demonstrates the best performance compared to other ratios. We also show that the combination of evolutionary and structural features demonstrates better results than using each one, separately.
Second, we improved our model by employing Deep Convolutional Neural Networks (CNN) to our extracted features for lysine acetylation sites prediction. By comparing the results to RF and SVM, we demonstrate that CNN achieves average 0.05 improvement over RF and average 0.04 improvement over SVM in MCC. We then compared our CNN model to two state-of-the-art models proposed in recent years. Results demonstrate that our model achieves 0.04 and 0.09 improvements in term of MCC for these two models, respectively.
In conclusion, we develop an effective and accurate computational predictor that enables us to identify lysine acetylation sites better than previously proposed methods found in the literature.