Background
The Distributed Statistical Computing course was developed and taught by Dr. Feng Li in 2014 for a joint master’s program in statistics with prestigious universities, Peking University, Renmin University of China, Central University of Finance and Economics, University of Chinese Academy of Sciences, and Capital University of Economics and Business.
This course is also offered by Dr. Feng Li for the Business Analytics program at Peking University from 2019-2023. The content of this course kept as is. An updated course Big Data Computation and Forecasting is available in Spring 2025 for Guanghua School of Management, Peking University.
Prerequisites
- Basic knowledge of statistics
- Basic knowledge in computing
Literature
- Distributed statistical computing (in Chinese) [New online book | Print version]
- Lecture notes
- Demo Hadoop/Spark configurations
- Spark standalone server on a SLURM cluster
Teaching videos
- The Chinese version of teaching videos are also available on https://space.bilibili.com/509963672
Slides and lecture notes
- Read with online Jupyter Notebook viewer
- Download all Jupyter Notebooks in a zip file.
- Download all data in a zip file.
Part I: Distributed Systems and Distributed Computing
Part II: Advanced Distributed Statistical Computing
Topic | Material |
---|---|
L10.1: Big Data Visualization: Challenges and Viabilities | HTML |
L10.2: Statistical Elements of Big Data Visualization | HTML |
L10.3: Computational Aspects of Big Data Visualization | |
L11: Distributed Statistical Computing: State of the Art | |
L11: Least-Square Approximation for a Distributed System | Paper Code |
L12: Distributed ARIMA models for ultra-long time series | Paper Code |
L13: Distributed Quantile Regression by Pilot Sampling and One-Step Updating | Paper Code |
L14: Bayesian Forecasting with Distributed VAR models |