# Forecasting hierarchical or grouped time series


## Feng Li

### Guanghua School of Management
### Peking University


### [feng.li@gsm.pku.edu.cn](feng.li@gsm.pku.edu.cn)
### Course home page: [https://feng.li/bdcf](https://feng.li/bdcf)



- We discuss forecasting large collections of time series that must add up in some way. 

- The challenge is that we require forecasts that are **coherent** across the aggregation structure. 

 That is, we require forecasts to add up in a manner that is consistent with the aggregation structure of the collection of time series. 
 
- We discuss several methods for producing coherent forecasts for both hierarchical and grouped time series.

## Hierarchical time series: Australian tourism

- Australia is divided into six states and two territories, with each one having its own government and some economic and administrative autonomy. 

- For simplicity, we refer to both states and territories as “states”. Each of these states can be further subdivided into regions. 

- In total there are 76 such regions. Business planners and tourism authorities are interested in forecasts for the whole of Australia, for each of the states and territories, and also for the regions.

![Austrilia](figures/ausmap.png)

![HTS](figures/hts.png)

- A simple hierarchical structure. At the top of the hierarchy is the “Total”, the most aggregate level of the data.

- The total number of series in the hierarchy is $n=1+2+5=8$, while the number of series at the bottom level is $m=5$. Note that $n>m$ in all hierarchies.

# Matrix notation 

\begin{equation*}
\begin{bmatrix}
 y_{t} \\
 y_{A,t} \\
 y_{B,t} \\
 y_{AA,t} \\
 y_{AB,t} \\
 y_{AC,t} \\
 y_{BA,t} \\
 y_{BB,t}
\end{bmatrix}
 =
 \begin{bmatrix}
 1 & 1 & 1 & 1 & 1 \\
 1 & 1 & 1 & 0 & 0 \\
 0 & 0 & 0 & 1 & 1 \\
 1 & 0 & 0 & 0 & 0 \\
 0 & 1 & 0 & 0 & 0 \\
 0 & 0 & 1 & 0 & 0 \\
 0 & 0 & 0 & 1 & 0 \\
 0 & 0 & 0 & 0 & 1
 \end{bmatrix}
 \begin{bmatrix}
 y_{AA,t} \\
 y_{AB,t} \\
 y_{AC,t} \\
 y_{BA,t} \\
 y_{BB,t}
 \end{bmatrix}
\end{equation*}

or in more compact notation 

\begin{equation}
 \mathbf{y}_t=\mathbf{S}\mathbf{b}_{t},
\end{equation}

# Mapping matrices

- Suppose we forecast all series ignoring any aggregation constraints. We call these the base forecasts and denote them by $\mathbf{y}_h$ where $h$ is the forecast horizon. They are stacked in the same order as the data $\mathbf{y}_t$.

- Then all **coherent** forecasting approaches for either hierarchical or grouped structures can be represented as

\begin{equation}
 \tilde{\mathbf{y}}_h=\mathbf{S}\mathbf{G}\hat{\mathbf{y}}_h,
\end{equation}

- The $\mathbf{G}$ matrix is defined according to the approach implemented. For example if the **bottom-up** approach is used to forecast the hierarchy, then 

\begin{equation*}
\mathbf{G}=
 \begin{bmatrix}
 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\
 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\
 \end{bmatrix}
\end{equation*}


- If any of the **top-down** approaches were used then

\begin{equation*}
\mathbf{G}=
 \begin{bmatrix}
 p_1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
 p_2 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
 p_3 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
 p_4 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
 p_5 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
 \end{bmatrix}.
\end{equation*}

![tourismRegions](figures/tourismRegions.png)

![tourismStates](figures/tourismStates.png)

# Grouped time series

- With grouped time series, the data structure does not naturally disaggregate in a unique hierarchical manner. 

- At the top of the grouped structure is the Total, the most aggregate level of the data, again represented by $y_t$. 

- The Total can be disaggregated by attributes (A, B) forming series $y_{A,t}$ and $y_{B,t}$, or by attributes $(X, Y)$ forming series $y_{X,t}$ and $y_{Y,t}$. 

- At the bottom level, the data are disaggregated by both attributes.

![GroupTree](figures/GroupTree.png)

# Forecast reconciliation

- The traditional methods considered so far are limited in that they only use base forecasts from a single level of aggregation which have either been aggregated or disaggregated to obtain forecasts at all other levels. Hence, they use limited information. 

- However, in general, we could use other $\mathbf{G}$ matrices, and then $\mathbf{SG}$ combines and reconciles all the base forecasts in order to produce coherent forecasts.

- In fact, we can find the optimal $\mathbf{G}$ matrix to give the most accurate reconciled forecasts.

## Further Reading

- Gross, C. W., & Sohl, J. E. (1990). Disaggregation methods to expedite product line forecasting. Journal of Forecasting, 9, 233–254. DOI: https://doi.org/10.1002/for.3980090304 provide a good introduction to the top-down approaches.
 
 
- Athanasopoulos, G., Gamakumara, P., Panagiotelis, A., Hyndman, R.J., Affan, M. (2020). Hierarchical Forecasting. In: Fuleky, P. (eds) Macroeconomic Forecasting in the Era of Big Data. Advanced Studies in Theoretical and Applied Econometrics, vol 52. Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-31150-6_21
