TY - JOUR TI - Comparative analysis of classification models for prostate cancer DO - https://doi.org/doi:10.7282/T3PR7XPP PY - 2014 AB - Among different types of cancers which occur in men, prostate cancer is the most commonly occurring one. However, prostate cancer epidemiology is not completely identified. Neither the causation nor pathogenesis of prostate cancer can be totally understood by today’s information. As a result, prostate cancer screening tests cannot always detect the disease. Prostate-specific antigen (PSA) blood test is the most widely used test to screen men for prostate cancer. However, PSA blood test adversaries accuse this test as rendering misguided results that lead to over-diagnosis and overtreatment. According to The National Cancer Institute, men who go through a prostate biopsy procedure because of an elevated PSA test result, only about 25 percent of them actually have prostate cancer. The other 75% of men might face the side effects of prostate biopsy which includes serious infections, pain, and bleeding. This study utilized the Nationwide Inpatient Sample (NIS), and the Surveillance, Epidemiology, and End Results (SEER) Program data to identify some key risk factors for those patients who are more likely to be diagnosed with prostate cancer. In addition, an Artificial Neural Network has been implemented a long with the most used classification methods that include Logistic Regression, k-Nearest Neighbors, Naïve Bayes classifier, Decision Tree classifier, and Support Vector Machine, in order to recognize prostate cancer in an early stage. All these classification methods’ results were analyzed using confusion matrix and Receiver Operating Characteristic (ROC) analyses. This study found that age, ethnicity, family history of cancer, fat intake, vitamin D deficiency, inflammation of prostate, vasectomy, and hypertension are positively associated with prostate cancer. Although, obesity, alcohol abuse, and smoking were significantly associated with the prostate cancer, this association found to be negative. The result of classification methods’ tests showed that the Artificial Neural Network had success rates of 87.53% on NIS data and 99.31% on SEER data compared to Logistic Regression (81.71%, 84.95%), k-Nearest Neighbors (73.46%, 91.62%), Naïve Bayes classifier (70.86%, 86.56%), Decision Tree classifier (78.02%, 90.34%), and Support Vector Machine (72.33%, 88.52%). In conclusion, this study tried to minimize the PSA false result by identifying more key risk factors and providing a prediction tool based on Artificial Neural Network to predict and to support the clinical decision in prostate cancer screening. KW - Biomedical Informatics KW - Prostate--Cancer--Diagnosis KW - Prostate-specific antigen KW - Neural networks (Computer science) LA - eng ER -