Khan, Faisal M.. Semi-supervised transductive regression for survival analysis in medical prognostics. Retrieved from https://doi.org/doi:10.7282/T3474D5G
DescriptionThe central challenge in predictive modeling for survival analysis in medical prognostics is the management of censored observations in the data. While time-to-event predictions can be modeled as regression problems, traditional regression techniques are challenged by the censored characteristics of the data. In such problems the true target times of a majority of instances are unknown; what is known is a censored target representing some indeterminate time before the true target time. The information for most patients is incomplete and only known “up-to-a-point.†Patients who have experienced the endpoint of interest (cancer recurrence, death, etc) during an often multi-year study are considered as non-censored or events. They may represent as little as 9% of the available sample. Most of the patients do not experience the endpoint or are lost to follow-up for various reasons (patient moved, died of other causes, etc.). These censored samples often represent most of the available sample. Modeling techniques which can correctly account for censored observations are crucial. Such censored samples can be considered as semi-supervised targets, however most efforts in semi-supervised regression do not take into account the partial nature of unsupervised information; with samples treated as either fully labelled or unlabeled. This dissertation presents a novel transduction approach for semi-supervised survival analysis. The true target times are approximated from the censored times through transduction to improve predictive performance. The framework can be employed to transform traditional regression methods for survival analysis, or to enhance existing survival analysis algorithms for improved predictive performance. This proposed approach represents one of the first applications of semi-supervised regression to survival analysis and yields significant improvements in predictive performance for multiple applications in prostate and breast cancer prognostics.