DescriptionModern information processing relies on the axiom that high-dimensional data lie near low-dimensional geometric structures. The work presented in this thesis aims to develop new models and algorithms for learning the geometric structures underlying data and to exploit the application of geometry learning in image and video analytics. The first part of the thesis revisits the problem of data-driven learning of these geometric structures and puts forth two new nonlinear geometric models for data describing "related" objects/phenomena. The first one of these models straddles the two extremes of the subspace model and the union-of-subspaces model, and is termed the emph{metric-constrained union-of-subspaces} (MC-UoS) model. The second one of these models---suited for data drawn from a mixture of nonlinear manifolds---generalizes the kernel subspace model, and is termed the emph{metric-constrained kernel union-of-subspaces} (MC-KUoS) model. The main contributions in this regard are threefold. First, we motivate and formalize the problems of MC-UoS and MC-KUoS learning. Second, we present algorithms that efficiently learn an MC-UoS or an MC-KUoS underlying data of interest. Third, we extend these algorithms to the case when parts of the data are missing. The second part of the thesis considers the problem of learning meaningful human action attributes from video data. Representation of human actions as a sequence of human body movements or action attributes enables the development of models for human activity recognition and summarization. We first propose a hierarchical union-of-subspaces model and an approach called hierarchical sparse subspace clustering (HSSC) is developed to learn this model from the data in an unsupervised manner by capturing the variations or movements of each action in different subspaces. We then present an extension of the low-rank representation (LRR) model, termed the emph{clustering-aware structure-constrained low-rank representation} (CS-LRR) model, for unsupervised learning of human action attributes from video data. The CS-LRR model is based on the union-of-subspaces framework, and integrates spectral clustering into the LRR optimization problem for better subspace clustering results. We also introduce a hierarchical subspace clustering approach, termed hierarchical CS-LRR, to learn the attributes without the need for a priori specification of their number. By visualizing and labeling these action attributes, the hierarchical model can be used to semantically summarize long video sequences of human actions at multiple resolutions. A human action or activity can also be uniquely represented as a sequence of transitions from one action attribute to another, which can then be used for human action recognition.