Our dlsa paper is accepted in the Journal of Computational and Graphical Statistics


Authors:
  Xuening Zhu, Feng Li and Hansheng Wang

Abstract:  In this work we develop a distributed least squares approximation (DLSA) method, which is able to solve a large family of regression problems (e.g., linear regression, logistic regression, Cox’s model) on a distributed system. By approximating the local objective function using a local quadratic form, we are able to obtain a combined estimator by taking a weighted average of local estimators. The resulting estimator is proved to be statistically as efficient as the global estimator. In the meanwhile it requires only one round of communication. We further conduct the shrinkage estimation based on the DLSA estimation by using an adaptive Lasso approach. The solution can be easily obtained by using the LARS algorithm on the master node. It is theoretically shown that the resulting estimator enjoys the oracle property and is selection consistent by using a newly designed distributed Bayesian Information Criterion (DBIC). The finite sample performance as well as the computational efficiency are further illustrated by extensive numerical study and an airline dataset. The airline dataset is 52GB in memory size. The entire methodology has been implemented by Python for a de-facto standard Spark system. By using the proposed DLSA algorithm on the Spark system, it takes 26 minutes to obtain a logistic regression estimator whereas a full likelihood algorithm takes 15 hours to reach an inferior result.

Published
Categorized as Default

By Feng Li

Dr. Feng Li is an Associate Professor of Statistics in the School of Statistics and Mathematics at Central University of Finance and Economics in Beijing, China. Feng obtained his Ph.D. degree in Statistics from Stockholm University, Sweden in 2013. His research interests include Bayesian computation, econometrics and forecasting, and distributed learning. His recent research output appeared in statistics and forecasting journals such as the International Journal of Forecasting and Statistical Analysis and Data Mining, AI journals such as Expert Systems with Applications, and medical journals such as BMJ Open.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *