{ "cells": [ { "cell_type": "markdown", "id": "c7930d23-0fb5-49e4-b25e-cc2e11195d03", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "# Distributed ARIMA Forecasting with Spark\n", "\n", "## Feng Li\n", "\n", "### Guanghua School of Management\n", "### Peking University\n", "\n", "\n", "### [feng.li@gsm.pku.edu.cn](feng.li@gsm.pku.edu.cn)\n", "### Course home page: [https://feng.li/bdcf](https://feng.li/bdcf)" ] }, { "cell_type": "markdown", "id": "97384b2a-089f-470f-8885-58a8c63ab745", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## A split-and-merge example using pandas and statsmodels\n", "\n", "- Split the full time series into n blocks (equal-length subseries).\n", "\n", "- Fit an ARIMA model to each block.\n", "\n", "- Collect ARIMA parameters.\n", "\n", "- Manual Forecast with ARIMA Global Estimator" ] }, { "cell_type": "code", "execution_count": 2, "id": "2fa2c56a-571d-425b-95dd-aa7329c73d0d", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | demand | \n", "time | \n", "
---|---|---|
0 | \n", "12864.000 | \n", "2003-03-01 00:00:00 | \n", "
1 | \n", "12389.000 | \n", "2003-03-01 01:00:00 | \n", "
2 | \n", "12155.000 | \n", "2003-03-01 02:00:00 | \n", "
3 | \n", "12072.000 | \n", "2003-03-01 03:00:00 | \n", "
4 | \n", "12162.000 | \n", "2003-03-01 04:00:00 | \n", "
... | \n", "... | \n", "... | \n", "
121287 | \n", "15199.857 | \n", "2016-12-31 19:00:00 | \n", "
121288 | \n", "14503.994 | \n", "2016-12-31 20:00:00 | \n", "
121289 | \n", "13829.016 | \n", "2016-12-31 21:00:00 | \n", "
121290 | \n", "13093.205 | \n", "2016-12-31 22:00:00 | \n", "
121291 | \n", "12370.639 | \n", "2016-12-31 23:00:00 | \n", "
121292 rows × 2 columns
\n", "\n", " | ar.L1 | \n", "ma.L1 | \n", "const | \n", "block_id | \n", "
---|---|---|---|---|
0 | \n", "0.696868 | \n", "0.678450 | \n", "139543.349121 | \n", "0 | \n", "
1 | \n", "0.696260 | \n", "0.665971 | \n", "137549.845347 | \n", "1 | \n", "
2 | \n", "0.691261 | \n", "0.661040 | \n", "127060.016145 | \n", "2 | \n", "
3 | \n", "0.711046 | \n", "0.660685 | \n", "115836.900034 | \n", "3 | \n", "
4 | \n", "0.728699 | \n", "0.671071 | \n", "94070.948686 | \n", "4 | \n", "
SparkSession - in-memory
\n", " \n", "SparkContext
\n", "\n", " \n", "\n", "v3.5.4
local[*]
Spark Forecasting