{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "# Forecast Reconciliation\n",
    "\n",
    "## Feng Li\n",
    "\n",
    "### Guanghua School of Management\n",
    "### Peking University\n",
    "\n",
    "\n",
    "### [feng.li@gsm.pku.edu.cn](feng.li@gsm.pku.edu.cn)\n",
    "### Course home page: [https://feng.li/bdcf](https://feng.li/bdcf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## The hierarchical forecasting problem\n",
    "\n",
    "* We want forecasts at all levels of aggregation.\n",
    "* If we model and forecast each series independently, the forecasts will almost certainly not add up.\n",
    "* We need to impose constraints on the forecasts to ensure they are \"coherent\".\n",
    "* We need to do this in a way that is computationally efficient.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![Traditional](figures/topdown.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Top-down forecasting\n",
    "\n",
    "\n",
    "* Works well in presence of low counts.\n",
    "* Single forecasting model easy to build\n",
    "* Provides reliable forecasts for aggregate levels.\n",
    "* Loss of information, especially individual series dynamics.\n",
    "* Distribution of forecasts to lower levels can be difficult\n",
    "* No prediction intervals\n",
    "\n",
    "## Bottom-up forecasting\n",
    "\n",
    "\n",
    "* No loss of information.\n",
    "* Better captures dynamics of individual series.\n",
    "* Large number of series to be forecast.\n",
    "* Constructing forecasting models is harder because of noisy data at bottom level.\n",
    "* No prediction intervals\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Forecasting notation\n",
    "\n",
    "- Let $\\hat{\\mathbf{y}}_{T+h|T}$ be vector of initial $h$-step forecasts, made at time $T$, stacked in same order as $\\mathbf{y}_t$. (In general, they will not \"add up\".)\n",
    "\n",
    "- **Coherent** linear forecasts are of the form:\n",
    "\n",
    "$$\n",
    "\\tilde{\\mathbf{y}}_{T+h|T}=\\mathbf{S}\\mathbf{G}\\hat{\\mathbf{y}}_{T+h|T}\n",
    "$$\n",
    "\n",
    "for some matrix $\\mathbf{G}$.\n",
    "\n",
    "* $\\mathbf{G}$ extracts and combines base forecasts $\\hat{\\mathbf{y}}_{T+h|T}$ to get bottom-level forecasts.\n",
    "* $\\mathbf{S}$ adds them up\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Single-level methods\n",
    "\n",
    "\n",
    "- **Bottom-up** forecasts are obtained using\n",
    "$$\n",
    "\\mathbf{G} = \\left[\\mathbf{0}\\mid \\mathbf{I}\\right],\n",
    "$$\n",
    "where $\\mathbf{0}$ is null matrix and $\\mathbf{I}$ is identity matrix.\n",
    "\n",
    "    - $\\mathbf{G}$ matrix extracts only bottom-level forecasts from $\\hat{\\mathbf{y}}_{T+h|T}$\n",
    "    - $\\mathbf{S}$ adds them up to give the bottom-up forecasts.\n",
    "\n",
    "\n",
    "- **Top-down** forecasts are obtained using\n",
    "$$\n",
    "\\mathbf{G}= \\left[\\mathbf{p}\\mid\\mathbf{0}\\right]\n",
    "$$\n",
    "where $\\mathbf{p}=[p_{1},p_{2},\\dots,p_{n_b}]'$ and $\\sum_{k=1}^{n_b} p_k = 1$.\n",
    "\n",
    "    - $\\mathbf{G}$ distributes forecasts of aggregate to lowest level series.\n",
    "    - Different methods of top-down forecasting lead to different proportionality vectors $\\mathbf{p}$.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Mean Property of single-level methods\n",
    "\n",
    "\n",
    "$$\n",
    "\\mathbb{E}[\\tilde{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T] \n",
    "= \\mathbf{SGE}[\\hat{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T] \n",
    "= \\mathbf{SE}[\\mathbf{b}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T]\n",
    "$$\n",
    "\n",
    "provided $ \\mathbf{SGS} = \\mathbf{S} $ and\n",
    "\n",
    "$$\n",
    "\\mathbb{E}[\\hat{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T] \n",
    "= \\mathbf{SE}[\\mathbf{b}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T].\n",
    "$$\n",
    "\n",
    "- Forecasts $ \\tilde{\\mathbf{y}}_{T+h \\mid T} $ are unbiased iff base forecasts $ \\hat{\\mathbf{y}}_{T+h \\mid T} $ are unbiased and $ \\mathbf{SGS} = \\mathbf{S} $.\n",
    "\n",
    "- $ \\mathbf{SGS} = \\mathbf{S} $ for bottom-up method\n",
    "\n",
    "- $ \\mathbf{SGS} \\ne \\mathbf{S} $ for top-down method"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Variance Property of single-level methods\n",
    "\n",
    "\n",
    "$$\n",
    "V_h = \\mathrm{Var}[\\mathbf{y}_{T+h} - \\tilde{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T] \n",
    "= \\mathbf{SGW}_h \\mathbf{G}^\\prime \\mathbf{S}^\\prime\n",
    "$$\n",
    "\n",
    "where $\\mathbf{W}_h = \\mathrm{Var}[\\mathbf{y}_{T+h} - \\hat{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T]$\n",
    "\n",
    "- $\\mathbf{W}_h$ is hard to estimate for $h > 1$.  \n",
    "- This suggests we should choose $\\mathbf{G}$ to minimise $V_h$.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Minimum trace reconciliation (MinT)\n",
    "\n",
    "\n",
    "If $\\mathbf{SG}$ is a projection, then the trace of $V_h$ is minimized when\n",
    "\n",
    "$$\n",
    "\\mathbf{G} = (\\mathbf{S}'\\mathbf{W}_h^{-1}\\mathbf{S})^{-1} \\mathbf{S}' \\mathbf{W}_h^{-1}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\tilde{\\mathbf{y}}_{T+h \\mid T} = \\mathbf{S}(\\mathbf{S}' \\mathbf{W}_h^{-1} \\mathbf{S})^{-1} \\mathbf{S}' \\mathbf{W}_h^{-1} \\hat{\\mathbf{y}}_{T+h \\mid T}\n",
    "$$\n",
    "\n",
    "\n",
    "- Trace of $V_h$ is sum of forecast variances.\n",
    "- MinT solution is $L_2$ optimal amongst linear unbiased forecasts.\n",
    "- How to estimate $\\mathbf{W}_h = \\mathrm{Var}[\\mathbf{y}_{T+h} - \\hat{\\mathbf{y}}_{T+h \\mid T} \\mid \\mathbf{y}_1, \\ldots, \\mathbf{y}_T]$?\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Reconciliation method $G$\n",
    "\n",
    "| Reconciliation method | $G$ |\n",
    "|------------------------|-----|\n",
    "| OLS                   | $(\\mathbf{S}'\\mathbf{S})^{-1}\\mathbf{S}'$ |\n",
    "| WLS(var)              | $(\\mathbf{S}'\\boldsymbol{\\Lambda}_s \\mathbf{S})^{-1}\\mathbf{S}' \\boldsymbol{\\Lambda}_v$ |\n",
    "| WLS(struct)           | $(\\mathbf{S}'\\boldsymbol{\\Lambda}_s \\mathbf{S})^{-1}\\mathbf{S}' \\boldsymbol{\\Lambda}_s$ |\n",
    "| MinT(sample)          | $(\\mathbf{S}'\\hat{\\mathbf{W}}_{\\text{sam}}^{-1} \\mathbf{S})^{-1} \\mathbf{S}' \\hat{\\mathbf{W}}_{\\text{sam}}^{-1}$ |\n",
    "| MinT(shrink)          | $(\\mathbf{S}'\\hat{\\mathbf{W}}_{\\text{shr}}^{-1} \\mathbf{S})^{-1} \\mathbf{S}' \\hat{\\mathbf{W}}_{\\text{shr}}^{-1}$ |\n",
    "\n",
    "These approximate MinT by assuming  $\\mathbf{W}_h = k_h \\mathbf{W}_1$\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "\n",
    "- $\\boldsymbol{\\Lambda}_s = \\mathrm{diag}(\\mathbf{W}_1)^{-1}$\n",
    "- $\\boldsymbol{\\Lambda}_s = \\mathrm{diag}(\\mathbf{S} \\mathbf{1})^{-1}$\n",
    "- $\\hat{\\mathbf{W}}_{\\text{sam}}$ is sample estimate of the residual covariance matrix\n",
    "- $\\hat{\\mathbf{W}}_{\\text{shr}}$ is shrinkage estimator  \n",
    "  $$\n",
    "  \\tau \\cdot \\mathrm{diag}(\\hat{\\mathbf{W}}_{\\text{sam}}) + (1 - \\tau) \\hat{\\mathbf{W}}_{\\text{sam}}\n",
    "  $$\n",
    "  where $\\tau$ is selected optimally.\n",
    "- Still need a good estimate of $\\mathbf{W}_h$ for forecast variance.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Further Reading\n",
    "\n",
    "- Gross, C. W., & Sohl, J. E. (1990). Disaggregation methods to expedite product line forecasting. Journal of Forecasting, 9, 233–254. DOI: https://doi.org/10.1002/for.3980090304 provide a good introduction to the top-down approaches.\n",
    "        \n",
    "- Athanasopoulos, G., Gamakumara, P., Panagiotelis, A., Hyndman, R.J., Affan, M. (2020). Hierarchical Forecasting. In: Fuleky, P. (eds) Macroeconomic Forecasting in the Era of Big Data. Advanced Studies in Theoretical and Applied Econometrics, vol 52. Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-31150-6_21\n",
    "\n",
    "- Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804–819. DOI: https://doi.org/10.1080/01621459.2018.1448825\n"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "python3.12",
   "language": "python",
   "name": "python3.12"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}