TY - JOUR TI - Signal and variance component estimation in high dimensional linear models DO - https://doi.org/doi:10.7282/T3TF01R9 PY - 2018 AB - Over the past several decades, dimensionalities of many datasets have grown exponentially as technology advances. Many approaches have been proposed to tackle high-dimensional problems, where dimensionality is much larger than the sample size. This dissertation focuses on developing methodologies for signal and variance component estimations in three different areas, compressive sensing, genome-wide association studies and demand forecasting in the e-commerce industry. In literature, signal and variance component estimations are usually treated as independent tasks, and this work draws the connection between these estimation goals. For the first problem in compressive sensing, we propose an algorithm that incorporates nonparametric empirical Bayes method with generalized approximate message passing (AMP). Generalized AMP is an effective algorithm for recovering signals from noisy linear measurements, assuming known a priori signal distributions. However, in practice, both the signal distribution and noise level are often unknown. We propose nonparametric maximum likelihood-AMP (NPML-AMP) for estimating an arbitrary signal distribution in this setting. In addition, we propose a simple noise variance estimator for use in conjunction with NPML-AMP. For the second problem in genome-wide association studies, we focus on heritability estimation methods related to variance components estimation problems for linear mixed models (LMMs). Heritability is the proportion of phenotype variance explained by genetic variance, and standard approaches to LMM-based heritability estimation have some unresolved inconsistencies. We suggest that by adopting a slightly different statistical perspective, many of these inconsistencies can be seamlessly resolved. Moreover, with Mahalanobis kernel, we define a natural version of heritability, as a conditional variance under the fixed-effects model. The third problem is associated with predictions for online retailing demand forecasting and genetic risk prediction. In these big-data applications, regression-based linear dimension reduction technique performs well in minimizing out-of-sample error. We identify the asymptotic risk of such sharp estimate with a model known to be misspecified. More importantly, we propose to estimate its asymptotic risk by variance component estimation discussed in the second problem. The risk evaluation technique can also be extended to the model comparison between other methods with explicit asymptotic risk. KW - Statistics and Biostatistics KW - Linear models (Statistics) LA - eng ER -