Description
TitleThree essays on large panel data econometrics
Date Created2021
Other Date2021-05 (degree)
Extent1 online resource (xii, 153) : illustrations
DescriptionMy dissertation consists of three chapters that focus on the development of new tools for use with big data, machine learning, and forecasting. In particular, I employ regularization methods from machine learning literature to improve the estimation of standard errors, efficient estimation of coefficients, and big data forecasts. Chapter 1 considers large factor models and proposes a new principal component analysis method that increases estimation accuracy and efficiency for big data forecasts. Chapters 2 and 3 develop the standard error of the ordinary least squares (OLS) estimation and the generalized least squares (GLS) estimation that take into account general dependences on the idiosyncratic error terms for linear panel data models.
All three chapters highlight the importance of cross-sectional heteroskedasticity and correlations of the error terms when the cluster structure is unknown. For example, idiosyncratic error variances can vary remarkably among different companies and the errors can be correlated among companies. In addition, the knowledge of cluster structure may be unavailable in practice. Taking these aspects into account is essential for obtaining more precise results for predictions or causal inference.
In Chapter 1, I propose a feasible weighted projected principal component analysis (FPPC) for factor models in which observable characteristics partially explain the latent factors. This novel method provides more efficient and accurate estimators than existing methods. To increase efficiency, I take into account both cross-sectional dependence and heteroskedasticity by using a consistent estimator of the inverse error covariance matrix as the weight matrix. To improve accuracy, I employ a projection approach using the additionally observed characteristics because the projection removes noise components in high-dimensional factor analysis. By using the FPPC method, estimators of the factors and loadings have faster rates of convergence than those of the conventional factor analysis. Moreover, I propose an FPPC-based diffusion index forecasting model. The limiting distribution of the parameter estimates and the rate of convergence for forecast errors are obtained. Using U.S. bond market and macroeconomic data, I demonstrate that the proposed model outperforms models based on conventional principal component estimators. I also show that the proposed model performs well among a large group of machine learning techniques in forecasting excess bond returns.
Chapter 2, a joint work with Jushan Bai and Yuan Liao, develops a new standard-error estimator for linear panel data models. The proposed estimator is robust to heteroskedasticity, serial correlation, and cross-sectional correlation of unknown forms. The serial correlation is controlled by the Newey-West method. To control for cross-sectional correlations, we propose to use the thresholding method, without assuming the clusters to be known. We establish the consistency of the proposed estimator. Monte Carlo simulations show the method works well. We illustrate our method in an application of U.S. divorce law reform effects and find that cross-sectional correlations are non-negligible.
In Chapter 3, a co-authored paper with Jushan Bai and Yuan Liao, we consider the GLS estimation for linear panel data models. By estimating the large error covariance matrix consistently, the proposed feasible GLS estimator is more efficient than the OLS estimator in the presence of heteroskedasticity and both serial and cross-sectional correlations. The covariance matrix used for the feasible GLS is estimated via the banding and thresholding method. We establish the limiting distribution of the proposed estimator. A Monte Carlo study is considered. The proposed method is applied to the U.S. divorce rate data. We find that our more efficient estimators identify the significant effects of divorce law reforms on the divorce rate and provide tighter confidence intervals than existing methods.
NotePh.D.
NoteIncludes bibliographical references
Genretheses, ETD doctoral
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.