Deep learning versus multifractal volatility forecasting using high frequency data with suppressed microstructure noise

John Collins, PhD
Multifractal volatility, Markov-switching model, MSM, high frequency data, Stochastic volatility, SV

Abstract :

Multifractal volatility predictions with a high-dimensional state space using high frequency data with suppressed microstructure noise: The Markov-switching multifractal stochastic volatility model (MSM) of Calvet & Fisher (2004, 2008a) permits the parsimonious specification of a high-dimensional state space. I show that out-of-sample performance improved when the state space was expanded, high-frequency data used, and microstructure noise taken into account. I enabled maximum likelihood estimation and analytical forecasting with up to 13 volatility frequencies and over 8,000 states, some eight times higher than previous literature, by implementing (coding) the model in new ways. I introduced a stochastic algorithm that combined heuristic procedures with local searches to perform an enhanced exploration of the state space in conjunction with local optimization. Rigorous preparation and cleansing of data, sparse sampling, and return innovations weighted by the respective depth of the best bid and ask, mitigated microstructure noise. These developments resulted in a well-specified model, better able to use the increased information provided to it by large high frequency (HF) datasets. In-sample model selection tests showed statistically significant monotonic improvement in the model as more volatility components were introduced. MSM(13) was compared to the relative accuracy of out-of-sample forecasts produced by a realized volatility measure using the heterogeneous autoregressive (HAR) model of Corsi (2009). MSM(13) provided comparatively better, statistically significant, forecasts than HAR most of the time at 1-hour, 1-day, and 2-day horizons for equity HF (Apple and J.P.Morgan) and foreign exchange HF (EURUSD) returns series. MZ regressions showed little sign of bias in the MSM(13) forecasts. These results suggest MSM may provide a viable alternative to established realized volatility estimators in high-frequency settings. 

Deep learning versus multifractal volatility forecasting: I compare two complex non-linear state space models: the first, a deep learning neural network, the long short-term memory (LSTM); the second, a Markov-switching multifractal stochastic volatility model. I analyze their comparative out-of-sample forecasting performance using high frequency equity and foreign exchange data in which I suppress the microstructure noise. The data covers the period January 2018 to March 2020 and includes extreme volatility regimes (the most extreme since the late 1920’s) induced by markets’ initial response to COVID-19. I implement a deep LSTM and MSM(13) with 8192 volatility states; as a result, both models invoke extremely large state spaces. Results show that, due to its inherent non-linearity, LSTM discovers, in an unsupervised manner, the different volatility regimes and provides reasonable out-of-sample performance, broadly consistent with literature. MSM(13), however, provides comparatively better forecasts than LSTM in almost all tests at 1-hour, 1-day, and 2-day horizons for equity HF (Apple and J.P.Morgan) and foreign exchange HF (EURUSD) returns series. I attribute these findings primarily to MSM (k)’s stronger underlying theoretical structure (i.e. the multifractal volatility approach better accounts for the stylized facts: almost unpredictable returns, fat tails, and volatility clustering) and MSM(13)’s superior fittingness to a large univariate time-series of HF data in which microstructure noise is suppressed. My research extends the multifractal literature in three ways. Firstly, I compare a deep learning model to a multifractal model, theoretically and empirically, and show that the deep learning and multifractal approaches have many similarities. The forward pass and backward propagation algorithms of the LSTM analogize to the transition matrix and Bayesian recursion of the MSM (k), whilst the LSTM gradient descent training procedure analogizes to the basin-hopping optimization framework in my implementation of MSM (k). Secondly, I provide a thorough derivation of the LSTM network, based upon the framework set out by Sherstinsky (2018, 2020). This step is generally skipped in finance deep learning literature featuring the LSTM. I thus provide a bridge between finance DL literature and the denser theoretical treatment of RNNs and the LSTM in the broader DL literature. By leveraging Sherstinsky (2020) I am able to describe the LSTM to a level of detail that enables (re-)construction of all its algorithms without recourse to an opensource deep learning framework. Thirdly, I compare both models within the setting of big data, using a very large HF time series that has been thoroughly cleansed and prepared and features extreme volatility regimes. These insights and contributions provide a structured framework for further comparisons of data-centric methods from the machine learning canon to multifractal methods, with an obvious candidate for further research being multivariate settings, predicting relations expected to be nonlinear, such as between realized volatility and its determinants.

Publication date of the thesis

Thesis committee

Supervisor: Laurent Calvet, EDHEC Business School 

External reviewer: Adlai Fisher, University of British Columbia Sauder Schoool of Business  

Other committee member: Nikolaos Tessaromatis, EDHEC Business School