Abstract: In this work we develop a distributed least squares approximation (DLSA) method, which is able to solve a large family of regression problems (e.g., linear regression, logistic regression, Cox’s model) on a distributed system. By approximating the local objective function using a local quadratic form, we are able to obtain a combined estimator by taking a weighted average of local estimators. The resulting estimator is proved to be statistically as efficient as the global estimator. In the meanwhile it requires only one round of communication. We further conduct the shrinkage estimation based on the DLSA estimation by using an adaptive Lasso approach. The solution can be easily obtained by using the LARS algorithm on the master node. It is theoretically shown that the resulting estimator enjoys the oracle property and is selection consistent by using a newly designed distributed Bayesian Information Criterion (DBIC). The finite sample performance as well as the computational efficiency are further illustrated by extensive numerical study and an airline dataset. The airline dataset is 52GB in memory size. The entire methodology has been implemented by Python for a de-facto standard Spark system. By using the proposed DLSA algorithm on the Spark system, it takes 26 minutes to obtain a logistic regression estimator whereas a full likelihood algorithm takes 15 hours to reach an inferior result.
Abstract: Interval forecasts have significant advantages in providing uncertainty estimation to point forecasts, leading to the importance of providing prediction intervals (PIs) as well as point forecasts. In this paper, we propose a general feature-based time series forecasting framework, which is divided into “offline” and “online” parts. In the “offline” part, we explore how time series features connect with the prediction interval forecasting accuracy of different forecasting methods by exploring generalized additive models (GAMs), which makes our proposed framework interpretable in the effects of features on the interval forecasting accuracy. Our proposed framework is in essence a model averaging process and we introduce a threshold ratio for the selection of individual forecasting methods in this process. In the “online” part, we calculate the point forecasts and PIs of new series by pre-trained GAMs and the corresponding optimal threshold ratio. We illustrate that our feature-based forecasting framework outperforms all individual benchmark forecasting methods on M3 competition data, with an improved computational efficiency.
Our Chinese translation for the forecasting textbook Forecasting: principles and practice by Rob J Hyndman and George Athanasopoulos is now available online.
The Chinese translation was produced by a team led by Professor Yanfei Kang (Beihang University) and Professor Feng Li (Central University of Finance and Economics). The following students were also involved: Cheng Fan, Liu Yu, Long Xiaoyu, Wang Xiaoqian, Zeng Jiayue, Zhang Bohan, and Zhu Shuaidong.