Wang, Wei. The divide-and-combine approaches for multivariate survival analysis and multistate survival analysis in big data. Retrieved from https://doi.org/doi:10.7282/t3-s5sq-vq68
DescriptionMultivariate failure time data can be unordered or ordered, which can be analyzed using multivariate survival analysis and multistate survival analysis, respectively. When sample sizes are extraordinarily large, both analyses could face computational challenges. In this dissertation, we propose divide-and-combine approaches to analyze large-scale multivariate failure time data in both multivariate survival analysis and multistate survival analysis. Our approaches are motivated by the Myocardial Infarction Data Acquisition System (MIDAS), a New Jersey statewide database that includes 73,725,160 admissions to non-federal hospitals and emergency rooms (ERs) from 1995 to 2017. We propose to randomly divide the full data into multiple subsets and propose a weighted method to combine these estimators obtained from individual subsets. In divided subsets, estimated regression parameters and estimated cumulative hazards are calculated, respectively, for multivariate survival analysis and multistate survival analysis. Under mild conditions, we show that the combined estimators are asymptotically equivalent to the estimators obtained from the full data as if the data were analyzed all at once. In addition, to screen out risk factors with weak signals in multivariate survival analysis, we propose to perform the regularized estimation on the combined estimators using their combined confidence distributions. Theoretical properties of proposed approaches, such as asymptotic equivalence between divide-and-combine analysis and full-data analysis, estimation consistency, selection consistency, and oracle properties are studied. Performances of proposed estimators are investigated using simulation studies. The MIDAS data are used to illustrate our proposed methodologies.