Comparisons of statistical methods for determining gene expression signatures to predict binary cancer response

Dong, Qian

doi:doi:10.7282/T3V69M6G

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Comparisons of statistical methods for determining gene expression signatures to predict binary cancer response

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(3.08 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Dong, Qian. Comparisons of statistical methods for determining gene expression signatures to predict binary cancer response. Retrieved from https://doi.org/doi:10.7282/T3V69M6G

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleComparisons of statistical methods for determining gene expression signatures to predict binary cancer response

NameDong, Qian (author); Moore, Dirk F (chair); Kim, Sinae (internal member); Martkert, Elke Katrin (outside member); Wu, Chengqing (outside member); Rutgers University; School of Public Health

Date Created2014

Other Date2014-10 (degree)

SubjectPublic Health, Statistical methods, Gene expression--Statistical methods, Cancer--Genetic aspects

Extent1 online resource (xxvi, 234 p. : ill.)

DescriptionCancer is a major public health problem with high mortality and mobility. In the past few decades, developments and progress of high-throughput molecular technologies have been used in diagnosing and managing treatments for cancers. Cancer classification using gene expression data poses many challenges to classical supervised learning methods. The main objective of this dissertation is to evaluate and compare the performances of six selected different classification methods, denoted as Logit (logistic regression), Lasso (least absolute shrinkage and selection operator), CART (classification and regression tree), RF (random forest), GBM (gradient boosted models), and SVM (support vector machine), for predicting binary cancer outcomes using gene expression data. We compare the performance using both real life datasets (prostate cancer data and breast cancer data) and extensive simulation experiments. Consistent with findings from previous comparisons of classifiers, the best classifier for predicting binary outcome varies with the dataset and the evaluation measures. No universally best performed classifier is identified which can work for all empirical datasets and under all simulation scenarios. When we compare different methods for classifications, especially classifiers for predicting cancer outcomes, accuracy should not be only thing we consider; other factors, such as simplicity to implement, ease of interpretation for clinicians or biologists, the biological insights that can be gained from the analysis results of a classifier, should also be taken into account. In addition, we have provided clear and easy-to-follow procedures of predictive model building and performance assessment for clinical researchers when there is a need to compare classification results from different classifier. We have addressed the binary classification problem in our thesis, but this approach should be easily applied to multi-category classification problems or to survival analysis problems. Based on results from real life datasets and extensive simulation experiments, we have found that when working with classification problem using high dimensional data, simple but widely used classification method, such as logistic regression has its limitation, and may not achieve the desirable performance. Classifiers designed to handle large numbers of predictors, such as Lasso, GBM, SVM and RF, are better choice in such situations.

NoteDr.P.H.

NoteIncludes bibliographical references

NoteQian Dong

Genretheses, ETD doctoral

Persistent URLhttps://doi.org/doi:10.7282/T3V69M6G

Languageeng

CollectionSchool of Public Health ETD Collection

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide