Time Series Features¶
Feng Li¶
Guanghua School of Management¶
Peking University¶
feng.li@gsm.pku.edu.cn¶
Course home page: https://feng.li/bdcf¶
Why do we need time series features? --- The No-Free-Lunch theorem¶
There is never universally best method that fits in all situations.
The explosion of new algorithms development makes the question even more worth focusing.
No single forecasting method stands out the best for any type of time series.
Literature¶
Features of time series $\rightarrow$ benefits in producing more accurate forecasting accuracies
Features $\rightarrow$ forecasting method selection rules
"Horses for courses" $\rightarrow$ effects of time series features to the forecasting performances
Visualize the performances of different forecasting methods $\rightarrow$ better understanding of their relative performances
Existing problems¶
inadequate features
limited training time series data (not only in number, but in diversity)
Questions to be answered¶
- What time series features should be used?
- How to construct time series features?
- How to visualize time series features by projection?
- How to model features and forecasting methods?
- How to generate new time series with certain features?
Time series features¶
Basic idea¶
Transform a given time series $\{x_1, x_2, \cdots, x_n\}$ to a feature vector $F = (F_1, F_2, \cdots, F_p)$.
A feature $F_k$ can be any kind of function computed from a time series:¶
- A simple mean
- The parameter of a fitted model
- Some statistic intended to highlight an attribute of the data
- ...
Which features should we use?¶
There does not exist the best feature representation of a time series.
Depends on both the nature of the time series being analysed, and the purpose of the analysis.
With unit roots, the mean is not a meaningful feature without some constraints on the initial values. \pause
CPU usage every minute for a large number of servers: we observe a daily seasonality. The mean may provide useful comparative information despite the time series not being stationary.
- Time series are of different lengths, on different scales, and with different properties.
- We restrict our features to be ergodic, stationary and independent of scale.
- 17 sets of diverse features.
- New features are intended to measure attributes associated with multiple seasonality, non-stationarity and heterogeneity of the time series.
Features for multiple seasonal time series¶
STL decompostion extension¶
$$ x_t = f_t + s_{1,t} + s_{2,t} + \cdots + s_{M,t} + e_t.$$ The strength of trend can be measured by: $$ F_{10} = 1- \frac{\text{var}(e_t)}{\text{var}(f_t + e_t)}. $$
The strength of seasonality for the $i$th seasonal component:
$$ F_{11,i} = 1- \frac{\text{var}(e_t)}{\text{var}(s_{i,t} + e_t)}. $$
Features on heterogenity¶
- Pre-whiten the time series $x_t$ to remove the mean, trend, and Autoregressive (AR) information.
- Fit an GARCH(1,1) model on the pre-whitened time series $y_t$ to measure for the ARCH effects.
- Test for the arch effects in the obtained residuals $z_t$ using a second GARCH(1,1) model.
Features¶
- The sum of squares of the first 12 autocorrelations of $\{y_t^2\}$.
- The sum of squares of the first 12 autocorrelations of $\{z_t^2\}$.
- The $R^2$ value of an AR model applied to $\{y_t^2\}$.
- The $R^2$ value of an AR model applied to $\{z_t^2\}$.
Walmart unit sales data¶
Data Structure¶
Hierarchy Level | Description | Number of Series |
---|---|---|
1 | All products, all stores, all states | 1 |
2 | All products by states | 3 |
3 | All products by store | 10 |
4 | All products by category | 3 |
5 | All products by department | 7 |
6 | Unit sales of all products, aggregated for each State and category | 9 |
7 | Unit sales of all products, aggregated for each State and department | 21 |
8 | Unit sales of all products, aggregated for each store and category | 30 |
9 | Unit sales of all products, aggregated for each store and department | 70 |
10 | Unit sales of product x, aggregated for all stores/states | 3,049 |
11 | Unit sales of product x, aggregated for each State | 9,147 |
12 | Unit sales of product x, aggregated for each store | 30,490 |
Total | 42,840 |
Features for sales data¶
Feature | Description |
---|---|
sell_price |
Price of item in store for given date. |
event_type |
108 categorical events, e.g. sporting, cultural, religious. |
event_name |
157 event names for event_type , e.g. Super Bowl, Valentine's Day, President's Day. |
event_name_2 |
Name of event feature as given in competition data. |
event_type_2 |
Type of event feature as given in competition data. |
snap_CA, TX, WI |
Binary indicator for SNAP information in CA, TX, WI. |
release |
Release week of item in store. |
- hierarchical structure of daily sales data of total $42,840$ series spanning 1,941 days
Features for sales data¶
Feature | Description |
---|---|
price_max, min |
Maximum, minimum price for item in store in the train data. |
price_mean, std, norm |
Mean, standard deviation, and normalized price for item in store in the train data. |
item, price_nunique |
Number of unique items, prices for item in store. |
price_diff_w |
Weekly price changes for items in store. |
price_diff_m |
Price changes of item in store compared to its monthly mean. |
price_diff_y |
Price changes of item in store compared to its yearly mean. |
tm_d |
Day of month. |
tm_w |
Week in year. |
tm_m |
Month in year. |
tm_y |
Year index in the train data. |
tm_wm |
Week in month. |
tm_dw |
Day of week. |
tm_w_end |
Weekend indicator. |
Visualisation features in 2D space¶
t-Stochastic Neighbor Embedding (t-SNE)¶
- Main idea: convert the distances to conditional probabilities and minimize the mismatch (kullback-Leibler divergence) between probabilities before and after the mapping.
- Nonlinear and retaining both local and global structure
PCA¶
- Linear, and putting more emphasis on keeping dissimilar data points far apart