Abstract
(type = abstract)
This dissertation studies methodologies on forecasting methods in high-frequency financial econometrics. The dissertation consists of three chapters. In the first chapter, I develop novel latent uncertainty measures (i.e., latent factors) using both high dimensional and high frequency financial data as well as multi-frequency macroeconomic data. In particular, I introduce three factors which capture macroeconomic fundamentals, market uncertainty, and financial market stress. These factors are analyzed in a series of forecasting experiments. In the second chapter of my dissertation, I investigate importance of co-jumps for predicting equity return volatility. In particular, using high frequency financial data, I disentangle individual sector jumps and multiple sector co-jumps. Using this information, I construct new jump and co-jump variation measures, which are included in a series of real-time prediction experiments in order to evaluate the importance of co-jumps when predicting stock market return volatility. Finally, in my third chapter, I review recent theoretical and methodological advances in the area of volatility/risk estimation, and in testing for jumps and co-jumps, using big data.
In chapter 2, we investigate the importance of co-jumps for predicting sector level equity return volatility. For our analysis we use the co-jump tests based on Barndorff- Nielsen and Shephard (2004), Jacod and Todorov (2009), and the jump test introduced in Huang and Tauchen (2005), in order to classify jumps in sector-level S&P500 exchangedtraded funds (ETF) as either idiosyncratic jumps or co jumps. We find that co-jumps are more densely populated during the 2008 financial crisis and 2011 debt crisis periods. Also, co-jumps occur frequently, and have large magnitudes compared with idiosyncratic jumps. These different types of jumps are analyzed in the context of volatility prediction, using extensions of Heterogeneous Autoregressive models (i.e., HAR-RV-CJ models). Empirical results are promising. There are clear marginal predictive gains associated with including certain types of jumps in HAR regressions; and it is found that the predictive content of co-jumps is higher than that idiosyncratic jumps. This is not surprising, if one assumes that idiosyncratic jumps may be “more” exogenously driven, and hence less useful thanco-jumps. In order to shed further light on the estimation of the co-jumps examined in our prediction experiments, we carry out Monte Carlo experiments that are designed to examine the relative performance of the three types of widely used co-jump tests (i.e., the BLT co-jump test of Bollerslev et al. (2008), the JT co-jump test of Jacod and Todorov(2009) and σ thresholding type tests based on bipower variation). Findings indicate that the JT co-jump test and the σ threshold test are more powerful, than the BLT co-jump test. However, there is also a distinct size trade-off when using the alternate tests.
In chapter3, we examine the usefulness of a large variety of machine learning methods for forecasting daily and monthly sector level equity returns. We also examine the usefulness of three new latent risk factors that are designed to capture key forecasting information associated with financial market stress, market uncertainty, and macroeconomic fundamentals. The factors are variously based on the decomposition (using high frequency financial data) of the quadratic covariation between two assets into continuous and jump components, and the extraction of latent factors from mixed frequency state space models populated with nonparametrically estimated components of quadratic variation and/or low frequency macroeconomic data. In addition to constructing predictions using standard machine learning methods such as random forest, gradient boosting, support vector machine learning, penalized regression, and neural networks, among others, we also investigate the predictive performance of a group of hybrid machine learning methods that combine least absolute shrinkage operator and neural network specification methods. Overall, at the monthly frequency, we find that machine learning methods significantly improve forecasting performance, as measured using mean square forecast error (MSFE) and directional predictive accuracy rate (DPAR), relative to the random walk and linear benchmark alternatives. The “best” method is clearly the random forest method, which “wins” in almost all permutations at the monthly frequency, across all of the “target” variables that we predict. It is also worth noting that our hybrid machine learning methods often outperform individual methods, when forecasting daily data, although predictive gains associated with the use of any machine learning method are substantially reduced when forecasting at a daily versus monthly frequency. Finally, the novel uncertainty factors that we build are present in almost all of our “MSFE-best” and directional “accuracy-best” models, suggesting that the risk factors constructed using both high frequency financial data (e.g., 5-minute frequency S&P500 and sector ETF data) and aggregate low frequency macroeconomic data, are useful for predicting returns.
In recent years, the field of financial econometrics has seen tremendous gains in the amount of data available for use in modeling and prediction. Much of this data is very high frequency, and even ‘tick-based’, and hence falls into the category of what might be termed “big data”. The availability of such data, particularly that available at high frequency on an intra-day basis, has spurred numerous theoretical advances in the areas of volatility/risk estimation and modeling. In chapter 4, we discuss key such advances, beginning with a survey of numerous nonparametric estimators of integrated volatility. Thereafter, we discuss testing for jumps using said estimators. Finally, we discuss recent advances in testing for co-jumps. Such co-jumps are important for a number of reasons. For example, the presence of co-jumps, in contexts where data has been partitioned into continuous and discontinuous (jump) components, is indicative of (near) instantaneous transmission of financial shocks across different sectors and companies in the markets; and hence represents a type of systemic risk. Additionally, the presence of co-jumps across sectors, say, suggests that if jumps can be predicted in one sector, then such predictions may have useful information for modeling variables such as returns and volatility in another sector. As an illustration of the methods discussed in this paper, we carry out an empirical analysis of DOW and NASDAQ stock price returns.