Gazzola, Gianluca. Supervised learning methods for variable importance and regression with uncertainty on dependent data. Retrieved from https://doi.org/doi:10.7282/t3-hbh2-6x30
DescriptionThis dissertation covers a collection of supervised learning methods targeted to data with complex dependence patterns. Part of our work orbits around the concept of variable importance, that is, the relative contribution an input variable to the prediction or the explanation of an output variable. Our interest in variable importance, and its estimation, is two-fold. On the one hand, as a tool for the characterization of data sets produced by multi-stage systems, where variables are related to each other via a network of correlations and causal dependencies. On the other hand, as a tool for the selection of minimal input-variable subsets with optimal predictive performance, in a more general framework involving data sets with an interesting structure of inter-variable dependence and redundancy. The rest of our work focuses on the problem of function approximation in the presence of uncertainty, and, specifically, on the calculation of optimal interpolating hyperplanes from data represented by convex polyhedra, rather than points. In this context, we propose algorithms to determine the spatial orientation of such polyhedra based on the multivariate relationships observed in the data, with particular focus on missing-value scenarios. For all of our methods, we present successful validation on an extensive and diverse array of real-world and simulated problems.