{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "# Forecast Reconciliation\n", "\n", "## Feng Li\n", "\n", "### Guanghua School of Management\n", "### Peking University\n", "\n", "\n", "### [feng.li@gsm.pku.edu.cn](feng.li@gsm.pku.edu.cn)\n", "### Course home page: [https://feng.li/bdcf](https://feng.li/bdcf)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The hierarchical forecasting problem\n", "\n", "* We want forecasts at all levels of aggregation.\n", "* If we model and forecast each series independently, the forecasts will almost certainly not add up.\n", "* We need to impose constraints on the forecasts to ensure they are \"coherent\".\n", "* We need to do this in a way that is computationally efficient.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![Traditional](figures/topdown.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Top-down forecasting\n", "\n", "\n", "* Works well in presence of low counts.\n", "* Single forecasting model easy to build\n", "* Provides reliable forecasts for aggregate levels.\n", "* Loss of information, especially individual series dynamics.\n", "* Distribution of forecasts to lower levels can be difficult\n", "* No prediction intervals\n", "\n", "## Bottom-up forecasting\n", "\n", "\n", "* No loss of information.\n", "* Better captures dynamics of individual series.\n", "* Large number of series to be forecast.\n", "* Constructing forecasting models is harder because of noisy data at bottom level.\n", "* No prediction intervals\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Forecasting notation\n", "\n", "- Let $\\hat{\\mathbf{y}}_{T+h|T}$ be vector of initial $h$-step forecasts, made at time $T$, stacked in same order as $\\mathbf{y}_t$. (In general, they will not \"add up\".)\n", "\n", "- **Coherent** linear forecasts are of the form:\n", "\n", "$$\n", "\\tilde{\\mathbf{y}}_{T+h|T}=\\mathbf{S}\\mathbf{G}\\hat{\\mathbf{y}}_{T+h|T}\n", "$$\n", "\n", "for some matrix $\\mathbf{G}$.\n", "\n", "* $\\mathbf{G}$ extracts and combines base forecasts $\\hat{\\mathbf{y}}_{T+h|T}$ to get bottom-level forecasts.\n", "* $\\mathbf{S}$ adds them up\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Single-level methods\n", "\n", "\n", "- **Bottom-up** forecasts are obtained using\n", "$$\n", "\\mathbf{G} = \\left[\\mathbf{0}\\mid \\mathbf{I}\\right],\n", "$$\n", "where $\\mathbf{0}$ is null matrix and $\\mathbf{I}$ is identity matrix.\n", "\n", " - $\\mathbf{G}$ matrix extracts only bottom-level forecasts from $\\hat{\\mathbf{y}}_{T+h|T}$\n", " - $\\mathbf{S}$ adds them up to give the bottom-up forecasts.\n", "\n", "\n", "- **Top-down** forecasts are obtained using\n", "$$\n", "\\mathbf{G}= \\left[\\mathbf{p}\\mid\\mathbf{0}\\right]\n", "$$\n", "where $\\mathbf{p}=[p_{1},p_{2},\\dots,p_{n_b}]'$ and $\\sum_{k=1}^{n_b} p_k = 1$.\n", "\n", " - $\\mathbf{G}$ distributes forecasts of aggregate to lowest level series.\n", " - Different methods of top-down forecasting lead to different proportionality vectors $\\mathbf{p}$.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Mean Property of single-level methods\n", "\n", "\n", "$$\n", "\\mathbb{E}[\\tilde{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T] \n", "= \\mathbf{SGE}[\\hat{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T] \n", "= \\mathbf{SE}[\\mathbf{b}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T]\n", "$$\n", "\n", "provided $ \\mathbf{SGS} = \\mathbf{S} $ and\n", "\n", "$$\n", "\\mathbb{E}[\\hat{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T] \n", "= \\mathbf{SE}[\\mathbf{b}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T].\n", "$$\n", "\n", "- Forecasts $ \\tilde{\\mathbf{y}}_{T+h \\mid T} $ are unbiased iff base forecasts $ \\hat{\\mathbf{y}}_{T+h \\mid T} $ are unbiased and $ \\mathbf{SGS} = \\mathbf{S} $.\n", "\n", "- $ \\mathbf{SGS} = \\mathbf{S} $ for bottom-up method\n", "\n", "- $ \\mathbf{SGS} \\ne \\mathbf{S} $ for top-down method" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Variance Property of single-level methods\n", "\n", "\n", "$$\n", "V_h = \\mathrm{Var}[\\mathbf{y}_{T+h} - \\tilde{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T] \n", "= \\mathbf{SGW}_h \\mathbf{G}^\\prime \\mathbf{S}^\\prime\n", "$$\n", "\n", "where $\\mathbf{W}_h = \\mathrm{Var}[\\mathbf{y}_{T+h} - \\hat{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T]$\n", "\n", "- $\\mathbf{W}_h$ is hard to estimate for $h > 1$. \n", "- This suggests we should choose $\\mathbf{G}$ to minimise $V_h$.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Minimum trace reconciliation (MinT)\n", "\n", "\n", "If $\\mathbf{SG}$ is a projection, then the trace of $V_h$ is minimized when\n", "\n", "$$\n", "\\mathbf{G} = (\\mathbf{S}'\\mathbf{W}_h^{-1}\\mathbf{S})^{-1} \\mathbf{S}' \\mathbf{W}_h^{-1}\n", "$$\n", "\n", "$$\n", "\\tilde{\\mathbf{y}}_{T+h \\mid T} = \\mathbf{S}(\\mathbf{S}' \\mathbf{W}_h^{-1} \\mathbf{S})^{-1} \\mathbf{S}' \\mathbf{W}_h^{-1} \\hat{\\mathbf{y}}_{T+h \\mid T}\n", "$$\n", "\n", "\n", "- Trace of $V_h$ is sum of forecast variances.\n", "- MinT solution is $L_2$ optimal amongst linear unbiased forecasts.\n", "- How to estimate $\\mathbf{W}_h = \\mathrm{Var}[\\mathbf{y}_{T+h} - \\hat{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T]$?\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Reconciliation method $G$\n", "\n", "| Reconciliation method | $G$ |\n", "|------------------------|-----|\n", "| OLS | $(\\mathbf{S}'\\mathbf{S})^{-1}\\mathbf{S}'$ |\n", "| WLS(var) | $(\\mathbf{S}'\\boldsymbol{\\Lambda}_s \\mathbf{S})^{-1}\\mathbf{S}' \\boldsymbol{\\Lambda}_v$ |\n", "| WLS(struct) | $(\\mathbf{S}'\\boldsymbol{\\Lambda}_s \\mathbf{S})^{-1}\\mathbf{S}' \\boldsymbol{\\Lambda}_s$ |\n", "| MinT(sample) | $(\\mathbf{S}'\\hat{\\mathbf{W}}_{\\text{sam}}^{-1} \\mathbf{S})^{-1} \\mathbf{S}' \\hat{\\mathbf{W}}_{\\text{sam}}^{-1}$ |\n", "| MinT(shrink) | $(\\mathbf{S}'\\hat{\\mathbf{W}}_{\\text{shr}}^{-1} \\mathbf{S})^{-1} \\mathbf{S}' \\hat{\\mathbf{W}}_{\\text{shr}}^{-1}$ |\n", "\n", "These approximate MinT by assuming $\\mathbf{W}_h = k_h \\mathbf{W}_1$\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "- $\\boldsymbol{\\Lambda}_s = \\mathrm{diag}(\\mathbf{W}_1)^{-1}$\n", "- $\\boldsymbol{\\Lambda}_s = \\mathrm{diag}(\\mathbf{S} \\mathbf{1})^{-1}$\n", "- $\\hat{\\mathbf{W}}_{\\text{sam}}$ is sample estimate of the residual covariance matrix\n", "- $\\hat{\\mathbf{W}}_{\\text{shr}}$ is shrinkage estimator \n", " $$\n", " \\tau \\cdot \\mathrm{diag}(\\hat{\\mathbf{W}}_{\\text{sam}}) + (1 - \\tau) \\hat{\\mathbf{W}}_{\\text{sam}}\n", " $$\n", " where $\\tau$ is selected optimally.\n", "- Still need a good estimate of $\\mathbf{W}_h$ for forecast variance.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Further Reading\n", "\n", "- Gross, C. W., & Sohl, J. E. (1990). Disaggregation methods to expedite product line forecasting. Journal of Forecasting, 9, 233–254. DOI: https://doi.org/10.1002/for.3980090304 provide a good introduction to the top-down approaches.\n", " \n", "- Athanasopoulos, G., Gamakumara, P., Panagiotelis, A., Hyndman, R.J., Affan, M. (2020). Hierarchical Forecasting. In: Fuleky, P. (eds) Macroeconomic Forecasting in the Era of Big Data. Advanced Studies in Theoretical and Applied Econometrics, vol 52. Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-31150-6_21\n", "\n", "- Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804–819. DOI: https://doi.org/10.1080/01621459.2018.1448825\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "python3.12", "language": "python", "name": "python3.12" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.9" } }, "nbformat": 4, "nbformat_minor": 4 }