Model Selection and Seasonal ARIMA¶

Feng Li

School of Statistics and Mathematics

Central University of Finance and Economics

feng.li@cufe.edu.cn

https://feng.li/python

Estimation¶

Conditional Sum of Squares¶

  • A straightforward way to estimate ARMA models is via least squares. For an ARMA(1,1)
$$\underset{c,\phi,\theta}{\min}\sum_{t=1}^T\left(y_t-c-\phi_1 y_{t-1}-\theta\epsilon_{t-1}\right)^2$$
  • Need to condition on initial values

  • Need to find a way to update all $\epsilon$

Recursion¶

  • Use the following recursion to update the $\epsilon$
$$\begin{aligned}\color{blue}{\epsilon_2}&=y_2-c-\phi y_{1}-\theta\epsilon_1\\\color{red}{\epsilon_3}&=y_t-c-\phi y_{2}-\theta\color{blue}{\epsilon_2}\\\epsilon_4&=y_t-c-\phi y_{3}-\theta\color{red}{\epsilon_3}\\\vdots&=\vdots\quad\vdots\quad\vdots\end{aligned}$$
  • There is no closed form solution for $c$, $\theta$ and $\phi$ but they can be found numerically.

Initial values¶

  • There are many options but a common one is to condition on the first $p$ observed values of $y$ and set any $\epsilon$ that need to be conditioned on to zero.
  • For an ARMA(1,1) condition on $y_1$ and set $\epsilon_1=0$. Then use observations $t=2,\dots,T$ in the sums of squares.
$$\underset{c,\phi,\theta}{\min}\sum_{t=2}^T\left(y_t-c-\phi_1 y_{t-1}-\theta\epsilon_{t-1}\right)^2$$
  • Other options include setting $y_0=0$ or setting $y_0=E(Y)$ and using all of the data.
  • For stationary, invertible process, impact is minimal with enough data.

Likelihood¶

  • Maximum likelihood maximises the joint density of the data with respect to the parameters
$$\begin{aligned}\hat{\Theta},\hat{\Phi},\hat\sigma,\hat c=&\underset{c,\Phi,\Theta,\sigma}{\max}f(y_1,y_2,\dots,y_T|c, \Theta,\Phi,\sigma^2\color{blue}{,y_0,y_{-1},\dots,\epsilon_0, \epsilon_{-1},\dots})\\=&\underset{c,\Phi,\Theta,\sigma}{\max}L(c, \Theta,\Phi,\sigma^2;y_1,y_2,\dots,y_T\color{blue}{,y_0,y_{-1},\dots,\epsilon_0, \epsilon_{-1},\dots})\\\end{aligned}$$
  • Assume $\epsilon_t\sim N(0,\sigma^2)$ for all $t$, i.e. assume normality
  • Conditional likelihood (includes parts in blue) equivalent to conditional sum of squares.
  • Unconditional likelihood does not have any of the parts in blue and requires the Kalman filter.

Advantages¶

  • Maximum likelihood is
    • A consistent estimator
    • Asymptotically normal
    • Can be used to compute information criteria for model selection
  • Assuming normality is a disadvantage, but...
  • ...pseudo maximum likelihood theory establishes robustness to this assumption.

Information criteria¶

For a model with $k$ parameters, all IC take the form

$$IC = -2\log L + k\times q$$
  • The log likelihood measures the "fit" of the data, with higher values indicating a better fit.
  • To avoid overfitting there is a "penalty" on the number of parameters $k$.
  • If many ARIMA models are fit by maximum likelihood, information criteria can be computed for each model
  • Choose the model with the lowest value of the information criterion.

AIC, AICc and BIC¶

  • If $q=2$ we have the AIC. Theoretically the AIC will choose a model closest to the true data generating process (DGP), even if the true DGP is not one of the models estimated.
  • If $q=2+\frac{2(k+1)}{T-k-1}$ we have the AICc. The AIC relies on asymptotic arguments, the AICc is a correction for finite samples.
  • if $q=log(T)$ we have the BIC. Theoretically the BIC is model consistent (it will choose the correct model as $T\rightarrow\infty$). However the true DGP must be one of the models under consideration.
  • In ARIMA modelling AICc is most commonly used

Model Selection¶

Auto Arima¶

  • Box-Jenkins approach of looking at ACF and PACF plots popular when computers were slow.
  • With improvements in computing it is possible to search through a large number of possible ARIMA models and select the best models using selection criteria.
  • One popular such algorithm is auto arima, proposed by Hyndman and Khandakar (2008) and implemented in R packages.
  • For a Python implementation install statsforecast

Auto Arima (find $d$)¶

  • Step 1 is to find the correct level of differencing using a hypothesis test
  • By default the KPSS test is used.
    • Null is that data are stationary
    • If fail to reject, $d=0$. If rejected, take differences and apply KPSS test again.
    • If fail to reject, $d=1$. If rejected a second time, $d=2$.
  • Other tests (augmented Dickey-Fuller and Phillips Perron) can be used.
    • The null for these is that data are non-stationary.

Auto Arima (find $p,q$)¶

  • Fit four models
    • ARIMA(0,d,0)
    • ARIMA(1,d,0)
    • ARIMA(0,d,1)
    • ARIMA(2,d,2)
  • Select the model from the above list that minimises AICc

Auto Arima (find $p,q$)¶

Using the best current model search "neighbouring models", which are

  • Models with AR order different by $\pm 1$.
  • Models with MA order different by $\pm 1$.
  • Either one or both of these conditions makes the model a neighbouring model.
  • The algorithm moves to a better model as soon as it is found, but can also be forced to search all neighbouring models.

Model neighbourhood¶

  • What are the "neghboring" models of an ARMA(2,3)?

Application¶

  • Forecast turnover in restaurant/cafe takeway sector in NSW
In [19]:
import pandas as pd
dat = pd.read_csv('data/takeaway.csv')
dat['Month']=pd.to_datetime(dat['Month'])
dat
Out[19]:
Month Turnover
0 1982-04-01 85.4
1 1982-05-01 84.8
2 1982-06-01 80.7
3 1982-07-01 82.4
4 1982-08-01 80.7
... ... ...
436 2018-08-01 579.2
437 2018-09-01 569.2
438 2018-10-01 588.6
439 2018-11-01 576.0
440 2018-12-01 630.3

441 rows × 2 columns

Time series plot¶

In [20]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(25, 6))
ax.plot(dat.Month, dat.Turnover)
Out[20]:
[<matplotlib.lines.Line2D at 0x7f02777dd2b0>]

Transformation¶

  • Best to stabilise variance using log transformation
  • We can also split data into a test and training sample.
In [21]:
import numpy as np
dat['logTurnover'] = np.log(dat['Turnover'])
train = dat.loc[1:404,:]
test = dat.loc[405:,:]

Output¶

  • The auto_arima_f function finds the best model.
    • The package is still under development and
    • It is designed to work with many models and time series
    • The classes created by the package are not easy to inspect compared to statsmodels.
In [22]:
import warnings
warnings.filterwarnings("ignore")
from statsforecast.arima import auto_arima_f
out = auto_arima_f(train['logTurnover'].to_numpy())
out['arma']
Out[22]:
(2, 3, 0, 0, 1, 1, 0)
  • The first two numbers are $p$ and $q$, sixth number is $d$, other numbers relate to seasonal ARIMA
  • The selected model is ARIMA(2,1,3)
In [23]:
out
Out[23]:
{'coef': {'ar1': -1.3971657461229432,
  'ar2': -0.5504300399710436,
  'ma1': 1.1684981191760229,
  'ma2': -0.04014085164881026,
  'ma3': -0.30891348747978087},
 'sigma2': 0.004209566597895503,
 'var_coef': array([[5.23704348e-07, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00],
        [3.82730285e-06, 2.99479269e-06, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00],
        [0.00000000e+00, 0.00000000e+00, 6.15729424e-06, 0.00000000e+00,
         0.00000000e+00],
        [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.15729424e-06,
         0.00000000e+00],
        [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
         6.15729424e-06]]),
 'mask': array([ True,  True,  True,  True,  True]),
 'loglik': 533.2699228793731,
 'aic': -1054.5398457587462,
 'arma': (2, 3, 0, 0, 1, 1, 0),
 'residuals': array([ 4.44029291e-03, -4.54976580e-02,  9.56731817e-03, -2.85556777e-02,
         1.97013798e-02,  5.17932328e-02,  2.82506441e-02,  1.16781948e-01,
        -1.42899268e-02, -6.15979826e-02, -5.75645930e-02, -6.06537902e-02,
        -1.06080594e-02, -2.78585870e-02,  5.94797524e-02,  7.42004642e-03,
         1.62131052e-02, -6.29711577e-02,  3.36299463e-02,  6.07726011e-02,
         1.23908437e-01, -4.74269220e-02,  6.41449341e-02, -7.85209939e-02,
         6.13893783e-02, -9.03865966e-02,  3.80051045e-02, -2.25966458e-02,
         2.14866108e-02,  5.91745843e-02,  4.20980755e-02,  8.72813702e-02,
        -2.88517060e-02, -1.15671673e-01,  1.64569053e-02,  1.27694578e-03,
         3.50817869e-02, -4.59425262e-02,  5.39035026e-02,  2.41930136e-02,
         3.56076982e-02,  5.75008641e-02,  8.56381479e-02,  1.00955466e-01,
        -5.11762109e-02, -1.21457590e-01,  1.67806339e-02,  2.86316189e-02,
         5.36152494e-02, -5.22152497e-02,  6.19240860e-03,  1.85203037e-02,
        -1.02053159e-03,  4.70671774e-02,  4.78557185e-02,  1.60955617e-01,
        -9.61759791e-02, -5.23276647e-02, -6.03278786e-02,  1.98220008e-02,
        -2.42677211e-02, -7.28709972e-04,  8.33153986e-02,  1.17778891e-02,
         4.53135418e-02,  4.80550024e-03,  3.31026618e-02,  7.13224696e-02,
         1.19501841e-02, -7.77003836e-02,  4.40335930e-02, -4.68521147e-02,
         2.97100543e-02, -4.50413089e-02,  3.23344469e-02, -2.64559051e-02,
        -6.50246833e-03,  1.09983666e-02,  1.43383877e-02,  7.98739532e-02,
         3.10050887e-03, -9.54046488e-02,  1.00778407e-02, -4.59702530e-02,
         2.62118232e-02,  6.20216370e-02,  3.75803669e-02,  3.61005058e-02,
         9.59609681e-03,  5.95501126e-02,  2.94191763e-02,  1.26478031e-01,
         6.84442323e-02, -7.69584443e-02,  3.17738702e-02, -1.26414819e-02,
         2.30656651e-02, -3.19328810e-02, -7.46516128e-05, -2.15484624e-03,
        -1.32391135e-01,  5.03995421e-02, -2.19148749e-02,  1.46150918e-01,
        -7.19257013e-02, -7.43303967e-02, -6.72985707e-03,  2.03940558e-02,
        -1.86275483e-02, -1.33360522e-02,  1.41837682e-02,  2.63253676e-02,
        -2.40452680e-02,  5.04812883e-02,  1.16516525e-02,  1.14231192e-01,
         1.11658609e-02, -4.56830306e-02,  5.53201845e-02, -2.33089170e-02,
         1.69945376e-02, -9.45191618e-02, -3.23511671e-02, -4.98810127e-02,
         2.21704665e-02,  1.65701702e-02,  1.17521517e-02,  3.29278391e-02,
        -6.59907292e-02, -1.25564503e-01, -1.09356147e-01,  3.57187028e-02,
        -7.18026719e-02,  5.89486319e-02,  2.27798547e-02,  7.76411155e-02,
         4.26051642e-02,  9.86611730e-02,  2.25670475e-02,  7.57197395e-02,
         8.73363586e-02, -1.41633340e-01,  1.26265073e-01, -1.68216835e-01,
         4.28707454e-02, -8.64513301e-02,  9.59173711e-02, -4.10543539e-02,
         6.28820617e-02, -8.48397753e-02,  1.32520208e-02,  7.16106343e-02,
         7.64904879e-02, -9.42331861e-02,  8.44571961e-02, -4.96688520e-03,
         3.01483603e-02, -5.75843817e-02,  1.07921898e-01,  1.43466683e-02,
         4.51074594e-02,  1.10322289e-01,  6.36546070e-02,  1.18259582e-01,
         6.06386849e-02, -4.94522521e-02,  3.14331868e-02, -2.59256422e-02,
        -1.23326613e-02, -6.29051059e-02, -9.85407438e-03, -1.56696679e-03,
        -3.16011655e-02,  6.50863953e-02, -1.89623843e-02,  8.50598888e-02,
         8.76205421e-03, -1.14723132e-01,  5.85380596e-02, -4.03829204e-02,
         4.07823194e-02, -8.71286513e-02,  5.93192138e-02, -2.37316163e-02,
         5.07361784e-02,  1.93334121e-05,  3.96629297e-03,  5.49561069e-02,
        -2.77789331e-02, -1.08175484e-01, -1.28522508e-02, -1.52940127e-01,
        -6.08791241e-02, -4.58251895e-02,  3.52823857e-02, -9.42911932e-02,
        -8.18510203e-03,  3.74715382e-02, -5.19007412e-03,  7.76639455e-02,
        -2.59009735e-02, -1.24703935e-01,  3.90903660e-02, -4.65669403e-02,
         2.94166768e-02, -4.48243000e-02, -1.74959413e-02,  1.59992034e-02,
         4.31485218e-02,  2.26347269e-02,  6.75451620e-03,  8.52911110e-04,
        -5.31324556e-02, -1.00045602e-01,  4.42689648e-02,  3.26911424e-02,
         7.28965368e-03,  3.16154367e-02,  1.75380354e-01,  4.35255661e-02,
         1.14910453e-01,  8.53524968e-02, -5.48295411e-02,  9.17508131e-02,
        -1.79676591e-02, -6.25646803e-02,  6.88278397e-02, -1.41354144e-01,
        -5.65601524e-02, -3.53098106e-02,  4.44229860e-02, -1.92105318e-02,
         2.42464469e-02,  6.90066100e-02, -3.77261105e-02,  1.40445849e-01,
         5.72197415e-02, -1.62128136e-01,  6.79075355e-02, -2.34019368e-02,
         1.77224286e-03, -5.41464918e-02,  8.56273371e-02, -4.16402866e-02,
         1.94085884e-02,  5.42892556e-02, -3.10420475e-02,  1.14645909e-01,
        -6.52151548e-03, -1.72741247e-01,  2.86268504e-02,  1.04143569e-01,
         6.43158368e-02, -2.76706356e-02,  7.08642985e-02,  1.57671003e-02,
         1.30281469e-02,  7.49839870e-02, -2.43497330e-02,  1.30414851e-01,
        -1.42912226e-02, -9.63395932e-02, -9.80161966e-03,  9.25846941e-02,
         1.32765116e-02, -1.20051425e-02,  6.16388885e-02, -4.25800038e-02,
         3.38116034e-02,  1.14558411e-02, -4.33306887e-02,  7.25887059e-02,
        -6.04710960e-02, -9.75983946e-02,  4.73093099e-03,  1.48605084e-02,
        -6.71766030e-03,  6.15159524e-03,  1.18792985e-01, -4.44372531e-02,
         4.02524606e-02,  8.76457573e-02,  3.26859777e-02,  1.07678114e-01,
        -7.29658834e-02, -1.33640617e-01,  2.09458650e-02, -1.60300374e-02,
         3.90596184e-04, -3.09976696e-02,  1.07115378e-01, -1.26259641e-02,
         1.97574645e-02, -6.52697577e-02,  9.87741080e-03,  4.76141105e-02,
        -2.50984411e-02, -1.32505589e-01,  7.19788377e-02, -1.83197442e-02,
         3.12961695e-02, -7.49866233e-03,  1.18648644e-01,  1.12216759e-02,
         4.87925530e-03,  1.87268526e-02, -1.46617905e-03,  6.72422507e-02,
        -7.46876009e-02, -2.58371926e-02,  9.99855357e-03,  1.41942320e-02,
         1.23944845e-02, -2.80792216e-02,  4.29071524e-02, -1.22531280e-02,
        -4.18955786e-02,  6.95224956e-02, -3.79122985e-02,  1.72880281e-01,
        -2.46544818e-02, -1.09147029e-01,  3.53859929e-02,  2.34379122e-02,
         3.72420520e-02, -5.44117007e-02,  1.16148369e-01,  2.86986838e-03,
         7.05058610e-02,  5.28870692e-02,  1.84301643e-02,  1.40401743e-01,
         4.58460315e-03, -1.35954620e-01, -6.89105457e-04,  5.54593818e-02,
         1.20335091e-02,  1.85011159e-02,  9.47188235e-02, -1.94164234e-02,
         4.08086187e-02, -3.34565137e-03, -2.83909851e-02,  1.16481611e-01,
        -8.65346211e-02, -1.58018014e-01, -2.54614621e-02, -3.99134183e-02,
        -5.57000474e-02, -3.19554332e-02, -2.68952318e-02, -1.98500610e-02,
         2.79791679e-02,  1.31351096e-03, -3.32015383e-02,  1.83497348e-01,
        -5.38751213e-02, -1.17406732e-01,  7.63048335e-03,  7.95547995e-02,
        -9.06377448e-03,  2.12562165e-02,  6.07248254e-02, -1.01372373e-02,
         1.55207350e-02, -1.99711460e-02,  2.38260261e-02,  1.15180429e-01,
        -1.17055915e-01, -1.34907749e-01,  7.20883687e-02,  1.00923052e-02,
         8.58537220e-03, -2.00993491e-02,  1.21649692e-02,  3.76445983e-02,
        -2.32873534e-02,  5.46469092e-02,  2.61834404e-02,  1.21012887e-01,
        -6.13634722e-02, -1.06796596e-01,  5.90874766e-02,  1.43586220e-02,
         3.21787239e-02, -5.01289045e-02,  9.16762265e-02,  1.08419307e-02,
         5.24409858e-02,  3.28827014e-02,  4.91753653e-02,  1.13253857e-01,
        -4.56721537e-02, -1.40014471e-01,  7.09532904e-02, -1.11604299e-02,
         3.02575191e-02, -2.35080360e-02,  9.34882337e-02, -1.63134307e-02,
         6.53457946e-02,  2.27425253e-02,  6.02643958e-03,  7.74224280e-02]),
 'code': 2,
 'n_cond': 0,
 'nobs': 403,
 'model': {'phi': array([-1.39716575, -0.55043004]),
  'theta': array([ 1.16849812, -0.04014085, -0.30891349]),
  'delta': array([1.]),
  'Z': array([1., 0., 0., 0., 1.]),
  'a': array([ 8.86840844e-02,  1.00895874e-01, -4.96945067e-03, -2.39168323e-02,
          6.20010341e+00]),
  'P': array([[ 2.22044605e-16,  2.22044605e-16,  2.77555756e-17,
          -5.55111512e-17, -5.82486353e-17],
         [ 2.22044605e-16,  2.44249065e-15,  5.13478149e-16,
          -6.66133815e-16, -2.05101327e-16],
         [ 2.77555756e-17,  5.13478149e-16,  1.44198889e-16,
          -1.78676518e-16, -3.17634911e-17],
         [-5.55111512e-17, -6.66133815e-16, -1.78676518e-16,
           2.08166817e-16,  4.47052676e-17],
         [-5.82486353e-17, -2.05101327e-16, -3.17634911e-17,
           4.47052676e-17,  5.82486353e-17]]),
  'T': array([[-1.39716575,  1.        ,  0.        ,  0.        ,  0.        ],
         [-0.55043004,  0.        ,  1.        ,  0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        ,  1.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
         [ 1.        ,  0.        ,  0.        ,  0.        ,  1.        ]]),
  'V': array([[ 1.        ,  1.16849812, -0.04014085, -0.30891349,  0.        ],
         [ 1.16849812,  1.36538785, -0.04690451, -0.36096483,  0.        ],
         [-0.04014085, -0.04690451,  0.00161129,  0.01240005, -0.        ],
         [-0.30891349, -0.36096483,  0.01240005,  0.09542754, -0.        ],
         [ 0.        ,  0.        , -0.        , -0.        ,  0.        ]]),
  'h': 0.0,
  'Pn': array([[ 1.00000000e+00,  1.16849812e+00, -4.01408516e-02,
          -3.08913487e-01,  8.64691236e-17],
         [ 1.16849812e+00,  1.36538785e+00, -4.69045097e-02,
          -3.60964829e-01, -3.59988979e-17],
         [-4.01408516e-02, -4.69045097e-02,  1.61128797e-03,
           1.24000505e-02, -3.75725852e-17],
         [-3.08913487e-01, -3.60964829e-01,  1.24000505e-02,
           9.54275427e-02,  0.00000000e+00],
         [ 8.64691236e-17, -3.59988979e-17, -3.75725852e-17,
           0.00000000e+00,  5.82486353e-17]])},
 'bic': -1030.561133227032,
 'aicc': -1054.327187530898,
 'ic': None,
 'xreg': None,
 'x': array([4.44029554, 4.39073858, 4.41158544, 4.39073858, 4.40793802,
        4.46935046, 4.47049528, 4.57677071, 4.53259949, 4.44382704,
        4.42723898, 4.37826959, 4.39568296, 4.37826959, 4.4391156 ,
        4.44500143, 4.43438187, 4.38327585, 4.41763506, 4.49535532,
        4.58087749, 4.50313746, 4.54965748, 4.48751214, 4.53044664,
        4.46935046, 4.48525989, 4.50313746, 4.49088104, 4.5716134 ,
        4.58292458, 4.65014355, 4.60316818, 4.46590812, 4.53903038,
        4.54648119, 4.56746832, 4.52396013, 4.5716134 , 4.60716819,
        4.60716819, 4.66908351, 4.72650247, 4.79991426, 4.71133038,
        4.58292458, 4.65681342, 4.69318106, 4.72561634, 4.66438205,
        4.66626529, 4.70862889, 4.68490515, 4.7379513 , 4.77406872,
        4.90970938, 4.77575649, 4.7022969 , 4.71133038, 4.72028299,
        4.72028299, 4.70411013, 4.80402104, 4.78998862, 4.81055702,
        4.82108769, 4.82831374, 4.90823336, 4.88507207, 4.79661665,
        4.86368088, 4.822698  , 4.84024231, 4.81624116, 4.83310225,
        4.82671246, 4.801559  , 4.83469334, 4.83786795, 4.91338991,
        4.89858579, 4.7782828 , 4.82831374, 4.79330813, 4.81624116,
        4.89485026, 4.89559848, 4.92071059, 4.91632461, 4.96633504,
        4.9863426 , 5.086361  , 5.13226278, 5.00193085, 5.05879034,
        5.05751888, 5.05879034, 5.04342512, 5.02978411, 5.04921478,
        4.90453376, 4.99179221, 4.99247132, 5.10291057, 5.03239679,
        4.91265489, 4.98292146, 4.98838969, 4.9705075 , 4.95864   ,
        4.9781121 , 5.00662727, 4.96633504, 5.02388052, 5.02912988,
        5.12336856, 5.11739483, 5.03304889, 5.12038616, 5.08450514,
        5.08821343, 5.01196774, 4.98017609, 4.97742316, 4.99314998,
        5.0271646 , 5.01661737, 5.05177724, 4.97535348, 4.85515039,
        4.801559  , 4.87596039, 4.81462041, 4.86676492, 4.91118322,
        4.94449549, 4.98770779, 5.04728861, 5.05241683, 5.09742442,
        5.17614973, 4.99179221, 5.13990763, 4.99247132, 5.00193085,
        5.00125813, 5.04213396, 5.04664573, 5.05560866, 5.00125813,
        4.99653637, 5.10412564, 5.14224818, 5.01794187, 5.11978861,
        5.11978861, 5.11379339, 5.07392303, 5.17388729, 5.18961795,
        5.18794431, 5.31073989, 5.3264188 , 5.41119952, 5.44630624,
        5.34758361, 5.40087395, 5.37481534, 5.35327892, 5.31172705,
        5.30678144, 5.32981603, 5.28675073, 5.36550862, 5.33801873,
        5.40312773, 5.4161004 , 5.26009615, 5.36877583, 5.33271879,
        5.3499606 , 5.28978104, 5.33271879, 5.34233425, 5.35280555,
        5.37481534, 5.34758361, 5.41743285, 5.37110304, 5.25332   ,
        5.28826703, 5.14923714, 5.11859244, 5.13108145, 5.16192474,
        5.07953927, 5.07204392, 5.14865659, 5.11379339, 5.19295685,
        5.1550242 , 5.00796507, 5.10473262, 5.06259503, 5.08016136,
        5.05815481, 5.02256386, 5.07267069, 5.09864617, 5.11259002,
        5.10533923, 5.10230248, 5.05113724, 4.96004351, 5.04471461,
        5.08140436, 5.05815481, 5.095589  , 5.25801607, 5.25540987,
        5.31861007, 5.39544408, 5.2801534 , 5.38770093, 5.36597602,
        5.2668267 , 5.38541207, 5.22143632, 5.18009674, 5.21112415,
        5.23962837, 5.23431204, 5.23697374, 5.32056798, 5.25017699,
        5.388615  , 5.4354672 , 5.20455599, 5.33416702, 5.3249593 ,
        5.28826703, 5.27248661, 5.3442463 , 5.31271325, 5.30131288,
        5.38678601, 5.31811999, 5.43720937, 5.420535  , 5.20290718,
        5.30777253, 5.4275897 , 5.44068461, 5.39089655, 5.45702906,
        5.47185042, 5.45189645, 5.53930084, 5.48604097, 5.60727157,
        5.5831203 , 5.43938281, 5.491414  , 5.58724866, 5.57632782,
        5.54165563, 5.6145869 , 5.55759996, 5.58236785, 5.61276308,
        5.53851467, 5.63657372, 5.56298666, 5.45403824, 5.5174529 ,
        5.53180721, 5.5174529 , 5.52585127, 5.64367855, 5.57063196,
        5.58949333, 5.70311559, 5.68119598, 5.77919911, 5.67880649,
        5.53180721, 5.62690143, 5.61130162, 5.60285656, 5.58687406,
        5.68968355, 5.66850066, 5.65178718, 5.61203262, 5.613493  ,
        5.68900719, 5.63300229, 5.50288953, 5.61895054, 5.60727157,
        5.60763861, 5.62112524, 5.71636959, 5.71636959, 5.67948978,
        5.71834263, 5.70111225, 5.76707021, 5.68255884, 5.64897424,
        5.70444892, 5.69541422, 5.71406278, 5.67572588, 5.72423848,
        5.71274222, 5.65284   , 5.75066606, 5.6957503 , 5.85736156,
        5.81919233, 5.6503817 , 5.76268012, 5.77548279, 5.79270868,
        5.73882782, 5.84845985, 5.85421195, 5.87183606, 5.94332417,
        5.91025449, 6.05514332, 6.0224786 , 5.84643878, 5.90726739,
        5.97787259, 5.96460709, 5.97482691, 6.06657202, 6.01956598,
        6.04334517, 6.05185385, 5.99893656, 6.13902211, 6.02417372,
        5.85248979, 5.91296232, 5.88638177, 5.83773045, 5.83276186,
        5.8168135 , 5.80904299, 5.8444136 , 5.84238432, 5.79909265,
        5.99670081, 5.90590666, 5.74652263, 5.83510309, 5.90889782,
        5.87689509, 5.88610403, 5.95220383, 5.91754886, 5.9242558 ,
        5.91377324, 5.92772571, 6.0530299 , 5.89302447, 5.7639364 ,
        5.91593248, 5.9105256 , 5.89357605, 5.88749196, 5.89053863,
        5.9396448 , 5.8957793 , 5.95220383, 5.97685839, 6.06796252,
        5.98921201, 5.8576474 , 5.98418814, 5.98645201, 5.99321302,
        5.95116325, 6.03356572, 6.05161847, 6.05795429, 6.10456999,
        6.1180972 , 6.22673428, 6.14203741, 5.98745653, 6.12424566,
        6.10969193, 6.11235386, 6.10702289, 6.18125812, 6.1649976 ,
        6.19664777, 6.23225153, 6.20010341, 6.2887875 ]),
 'lambda': None}

Fit model using statsmodels¶

In [24]:
import statsmodels as sm
fitaa = sm.tsa.arima.model.ARIMA(train['logTurnover'], order = (2,1,3)).fit()
fitaa.summary()
Out[24]:
SARIMAX Results
Dep. Variable: logTurnover No. Observations: 404
Model: ARIMA(2, 1, 3) Log Likelihood 531.827
Date: Thu, 07 Jul 2022 AIC -1051.654
Time: 13:28:47 BIC -1027.661
Sample: 0 HQIC -1042.155
- 404
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 0.2047 3.820 0.054 0.957 -7.283 7.692
ar.L2 0.5400 2.431 0.222 0.824 -4.225 5.305
ma.L1 -0.5118 3.832 -0.134 0.894 -8.022 6.998
ma.L2 -0.6869 1.250 -0.549 0.583 -3.138 1.764
ma.L3 0.3407 1.572 0.217 0.828 -2.740 3.421
sigma2 0.0042 0.000 14.293 0.000 0.004 0.005
Ljung-Box (L1) (Q): 0.59 Jarque-Bera (JB): 7.97
Prob(Q): 0.44 Prob(JB): 0.02
Heteroskedasticity (H): 1.37 Skew: -0.30
Prob(H) (two-sided): 0.07 Kurtosis: 3.33


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

Plot of forecast¶

In [25]:
fcaa = fitaa.forecast(36)
fig, ax = plt.subplots(1,figsize=(25,6))
ax.plot(train['Month'].tail(48),train['logTurnover'].tail(48),color='black')
ax.plot(test['Month'],test['logTurnover'],color='gray')
ax.plot(test['Month'],fcaa,color='blue')
Out[25]:
[<matplotlib.lines.Line2D at 0x7f0278ad0340>]

Seasonality¶

  • The ARIMA models as we have studied them so far do not take into account seasonality.
  • The forecasts work for short horizons, but fail to mimic the seasonal pattern at long horizons .
  • Looking at ACF and PACF plots leads to the same conclusion
  • Let's consider the ACF and PACF for the difference of log Turnover

ACF and PACF¶

In [26]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
fig, ax = plt.subplots(1,2,figsize=(15,6))
dify = train.diff()['logTurnover'].iloc[2:]
plot_acf(dify,ax=ax[0])
plot_pacf(dify,ax=ax[1])
plt.show()

Pure Seasonal models¶

  • Seasonal AR(1): $Y_t=\phi^{(s)} Y_{t-m}+\epsilon_t$
  • Seasonal AR(2): $Y_t=\phi_1^{(s)} Y_{t-m}+\phi_2^{(s)} Y_{t-2m}+\epsilon_t$
  • Seasonal AR(p): $(1-\Phi^{(s)}(L^m))Y_t=\epsilon_t$
  • Seasonal ARIMA(p,d,q): $(1-\Phi^{(s)}(L^m))(1-L^m)^DY_t=(1+\Theta^{(s)}(L^m))\epsilon_t$

Seasonal ARIMA¶

The most general form for seasonal ARIMA is

$$(1-\Phi(L))(1-\Phi^{(s)}(L^m))(1-L)^d(1-L^m)^DY_t=(1+\Theta(L))(1+\Theta^{(s)}(L^m))\epsilon_t$$
  • Note that seasonal difference/components are always applied before non-seasonal components.

  • The auto arima algorithm can be generalised to seasonal ARIMA

Auto arima (seasonal)¶

Specify the seasonal period and set seasonal=True in auto_arima_f

In [27]:
out = auto_arima_f(train['logTurnover'].to_numpy(),seasonal=True,period=12)
out['arma']
Out[27]:
(2, 3, 2, 2, 12, 0, 1)
  • First two numbers are non-seasonal AR and MA orders.
  • Third and fourth numbers are seasonal AR and MA orders.
  • Fifth number is period.
  • Sixth and Seventh number are non-seasonal and seasonal orders of differencing.
In [28]:
out.keys()
Out[28]:
dict_keys(['coef', 'sigma2', 'var_coef', 'mask', 'loglik', 'aic', 'arma', 'residuals', 'code', 'n_cond', 'nobs', 'model', 'xreg', 'bic', 'aicc', 'ic', 'x', 'lambda'])
In [29]:
out['coef']
Out[29]:
{'ar1': 1.495027198126436,
 'ar2': -0.6042819420875728,
 'ma1': -0.9348258615219661,
 'ma2': 0.1883384110246554,
 'ma3': 0.08816517296129588,
 'sar1': -0.17736054462940312,
 'sar2': 0.004977028288534635,
 'sma1': -0.6182574588587337,
 'sma2': -0.22384124235583505,
 'drift': 0.004052622741266676}
In [30]:
[out['loglik'], out['aicc'], out['bic']]
Out[30]:
[619.8529481890404, -1217.0093264572365, -1174.0501132182417]
In [31]:
out['var_coef']
Out[31]:
array([[ 2.89379344e-07,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00],
       [-3.25946707e-06,  2.62594567e-06,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  6.50770512e-06,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         6.50770512e-06,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  6.50770512e-06,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  6.50770512e-06,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  2.31986211e-06,
         6.50731361e-06,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  6.50770512e-06,  0.00000000e+00,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  6.50770512e-06,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         6.50770512e-06]])

Plots¶

In [37]:
# Fit a SARIMA model
fitaas = sm.tsa.arima.model.ARIMA(train['logTurnover'], order = (2,0,3),seasonal_order=(2,1,2,12)).fit()
fitaas.summary()
Out[37]:
SARIMAX Results
Dep. Variable: logTurnover No. Observations: 404
Model: ARIMA(2, 0, 3)x(2, 1, [1, 2], 12) Log Likelihood 669.876
Date: Thu, 07 Jul 2022 AIC -1319.751
Time: 13:39:13 BIC -1280.039
Sample: 0 HQIC -1304.012
- 404
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 1.5698 0.385 4.081 0.000 0.816 2.324
ar.L2 -0.5862 0.373 -1.572 0.116 -1.317 0.145
ma.L1 -0.7365 0.377 -1.952 0.051 -1.476 0.003
ma.L2 0.0039 0.072 0.055 0.956 -0.137 0.144
ma.L3 0.1126 0.048 2.332 0.020 0.018 0.207
ar.S.L12 -0.4528 0.721 -0.628 0.530 -1.867 0.961
ar.S.L24 0.0621 0.070 0.892 0.373 -0.074 0.199
ma.S.L12 -0.3282 0.716 -0.459 0.646 -1.731 1.074
ma.S.L24 -0.3748 0.537 -0.697 0.486 -1.428 0.678
sigma2 0.0018 0.000 16.868 0.000 0.002 0.002
Ljung-Box (L1) (Q): 2.01 Jarque-Bera (JB): 32.52
Prob(Q): 0.16 Prob(JB): 0.00
Heteroskedasticity (H): 0.74 Skew: -0.14
Prob(H) (two-sided): 0.08 Kurtosis: 4.38


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
In [38]:
fcaas = fitaas.forecast(36)
fig, ax = plt.subplots(1,figsize=(25,6))
dat.Month, dat.Turnover
ax.plot(train['Month'].tail(48),train['logTurnover'].tail(48),color='black')
ax.plot(test['Month'],test['logTurnover'],color='gray')
ax.plot(test['Month'],fcaas,color='blue')
Out[38]:
[<matplotlib.lines.Line2D at 0x7f0278f44e80>]

Using Covariates¶

  • An important characteristic of (S)ARIMA models is the ability for them to include covariates or regressors.
  • The model becomes
$$(1-\Phi(L))(1-\Phi^{(s)}(L^m))(1-L)^d(1-L^m)^D\color{blue}{(y_t-\mathbf{x}'_t\boldsymbol{\beta})}=(1+\Theta(L))(1+\Theta^{(s)}(L^m))\epsilon_t$$
  • This is a regression model with (S)ARIMA errors.
  • If the purpose of the analysis is to forecast we can only use regressors that are themselves available when we make the forecast.
  • Consider forecasting demand for bike sharing scheme.
  • To forecast tomorrow's demand we cannot use tomorrow's weather.
  • We can however use dummy variables for the type of day, public holidays, etc.

Dataset¶

DC bikeshare data of Hadi and Gama (2013) from UCI Machine Learning Repository

In [39]:
bikes=pd.read_csv('data/bike_sharing_daily.csv')
bikes['dteday']=pd.to_datetime(bikes['dteday'])
fig, ax = plt.subplots(1, figsize = (25, 6))
ax.plot(bikes['dteday'], bikes['cnt'])
Out[39]:
[<matplotlib.lines.Line2D at 0x7f02787ee790>]

Application¶

  • Can handle weekly seasonality by considering seasonal ARIMA models with a period of 7.
  • Handle yearly seasonality with month dummies
  • Keep holiday dummies
  • Convert data into numpy array as input to auto_arima
In [40]:
bikes_clean = bikes[['cnt','mnth','holiday']]
bikes_clean['int']=1.0
bikes_clean = pd.get_dummies(bikes_clean, columns=['mnth'], drop_first=True)
bikes_arr = bikes_clean.to_numpy(dtype='float')
bikes_train=bikes_arr[:640,:]

Fit and forecast¶

In [41]:
out = auto_arima_f(bikes_train[:,0],xreg = bikes_train[:,1:],seasonal=True,period=7)
print(out['arma'])
(1, 1, 0, 1, 7, 1, 0)

For this particular dataset auto arima chooses a SARIMA(1,1,1)(0,0,1)[7]

Fit model¶

In [42]:
fitaasx = sm.tsa.arima.model.ARIMA(bikes_train[:,0],exog=bikes_train[:,1:],order = (1,1,1),seasonal_order=(0,0,1,7)).fit()
fitaasx.summary()
Out[42]:
SARIMAX Results
Dep. Variable: y No. Observations: 640
Model: ARIMA(1, 1, 1)x(0, 0, 1, 7) Log Likelihood -5229.620
Date: Thu, 07 Jul 2022 AIC 10493.239
Time: 13:46:39 BIC 10569.058
Sample: 0 HQIC 10522.670
- 640
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
x1 -24.5998 243.318 -0.101 0.919 -501.495 452.296
const 0.1058 9848.695 1.07e-05 1.000 -1.93e+04 1.93e+04
x2 120.4969 765.482 0.157 0.875 -1379.820 1620.813
x3 2043.6781 811.663 2.518 0.012 452.848 3634.508
x4 2370.7998 1079.991 2.195 0.028 254.056 4487.544
x5 1627.3375 1139.005 1.429 0.153 -605.072 3859.747
x6 176.3790 1163.088 0.152 0.879 -2103.232 2455.990
x7 168.3286 1258.711 0.134 0.894 -2298.699 2635.356
x8 482.6837 1364.157 0.354 0.723 -2191.014 3156.382
x9 57.8428 1402.966 0.041 0.967 -2691.920 2807.605
x10 -1229.7267 1442.379 -0.853 0.394 -4056.737 1597.284
x11 -529.5451 1248.308 -0.424 0.671 -2976.184 1917.094
x12 -113.0717 960.107 -0.118 0.906 -1994.846 1768.703
ar.L1 0.3220 0.047 6.838 0.000 0.230 0.414
ma.L1 -0.8579 0.028 -30.259 0.000 -0.914 -0.802
ma.S.L7 -0.0065 0.035 -0.185 0.853 -0.075 0.062
sigma2 7.477e+05 2.73e+04 27.374 0.000 6.94e+05 8.01e+05
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 615.96
Prob(Q): 0.96 Prob(JB): 0.00
Heteroskedasticity (H): 2.76 Skew: -1.06
Prob(H) (two-sided): 0.00 Kurtosis: 7.32


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

Forecast¶

In [44]:
future_x =  bikes_arr[640:,1:]
fc = fitaasx.forecast(len(future_x),exog = future_x)

fig, ax = plt.subplots(1, figsize = (25, 6))
ax.plot(bikes.iloc[550:640,1], bikes.iloc[550:640,15])
ax.plot(bikes.iloc[640:,1], fc)
Out[44]:
[<matplotlib.lines.Line2D at 0x7f02787dedf0>]

Fourier terms¶

  • An alternative to month dummies is to use Fourier terms. For data with period $m$
$$x_t^{(s)}=\sin\left(\frac{2\pi j t}{m}\right)\quad\textrm{and}\quad x_t^{(c)}=\cos\left(\frac{2\pi j t}{m}\right)\quad \textrm{for}\,j=1,2,\dots,J$$
  • Fourier terms can be used to represent any periodic function to an arbitrary degree of accuracy.
  • In applied work, as few as 2-3 pairs of Fourier terms can be sufficient.
  • This is particularly useful for long seasonalities (e.g. $m=365$).

Wrap-up¶

  • Modern algorithms for selecting the order of ARIMA models are based on stepwise search, likelihood estimation and the AICc.
  • Seasonal ARIMA can capture seasonal effects with a small period.
  • Covariates based on calendar effects can be very useful for data based on human behaviour.
  • Regression with Fourier terms can capture seasonal effects with a long period.