✈️ Air Passengers:AI 预测与滚动评估¶
目标
- 使用
air_passengers_with_id.csv。 - 统一用
from nixtla import NixtlaClient做预测:比较seasonal_naive / theta / ets / auto_arima。 - 评估指标:ME / MAE / MSE / RMSE / MPE / MAPE / sMAPE。
- Rolling Window:对 T+1 ~ T+5 的平均误差进行汇总对比。
导入与初始化¶
In [1]:
# 0) 导入与初始化
import numpy as np
import pandas as pd
from nixtla import NixtlaClient
from utilsforecast.losses import mae, mse, rmse, mape, smape
from utilsforecast.evaluation import evaluate
nixtla = NixtlaClient(api_key='YOUR-API-KEY') # 初始化客户端
FREQ = "MS" # 月度
MODELS = ["timegpt-1", "timegpt-1-long-horizon"] # 模型
读取数据¶
In [2]:
# 1) 读取数据
df = pd.read_csv("data/air_passengers_with_id.csv") #输入你的真实地址
df["ds"] = pd.to_datetime(df["ds"]) #将ds列转换为日期时间格式
df = df.sort_values(["unique_id","ds"]).reset_index(drop=True) #按unique_id和ds列排序,并重置索引
UID = df["unique_id"].iloc[0]
定义指标函数ME、MPE¶
In [3]:
# 2) 指标函数:ME / MPE(%)
def me(y_true, y_pred): #计算预测值与真实值的平均误差
y_true = np.asarray(y_true, float); y_pred = np.asarray(y_pred, float)
return float(np.mean(y_true - y_pred))
def mpe(y_true, y_pred): #计算预测值与真实值的百分比误差
y_true = np.asarray(y_true, float); y_pred = np.asarray(y_pred, float)
mask = y_true != 0
return float(np.mean((y_true[mask] - y_pred[mask]) / y_true[mask]) * 100.0) if np.any(mask) else np.nan
评估函数¶
In [4]:
# 3) 公共小工具
_METRICS = [mae, mse, rmse, mape, smape]
def _eval_df(df_like):
ev = evaluate(df=df_like, metrics=_METRICS, models=["TimeGPT"],
time_col="ds", target_col="y").set_index("metric")["TimeGPT"]
y, yhat = df_like["y"].values, df_like["TimeGPT"].values; return ev, y, yhat
In [ ]:
# 4) 函数1:单模型的 Hold-out 指标
def holdout_metrics(df, model_name, H_TEST=12):
# 划分训练集和测试集
train, test = df.iloc[:-H_TEST], df.iloc[-H_TEST:]
# 使用 Nixtla 模型进行预测,并与真实测试数据按日期合并
merged = nixtla.forecast(df=train, h=H_TEST, freq=FREQ,
model=model_name).merge(test, on=["unique_id","ds"], how="left")
# 计算误差指标并提取真实值与预测值
ev, y, yhat = _eval_df(merged)
# 汇总结果
return {"model":model_name,"unique_id":UID,"ME":me(y,yhat),"MPE":mpe(y,yhat),
"MAE":ev["mae"],"MSE":ev["mse"],"RMSE":ev["rmse"],"MAPE":ev["mape"],"sMAPE":ev["smape"]}
In [ ]:
# 5) 函数2:单模型的 Rolling(T+1..T+5) 平均指标
def rolling_metrics(df, model_name, H=5, n_windows=10):
# 生成多次滚动窗口预测结果,并按时间排序
cv = nixtla.cross_validation(df=df, h=H, n_windows=n_windows, freq=FREQ,
model=model_name).sort_values(["unique_id","cutoff","ds"]).reset_index(drop=True)
# 给每个窗口的预测样本标上步长 h(第几步预测)
cv["h"] = cv.groupby(["unique_id","cutoff"]).cumcount() + 1; rows = []
# 循环计算每个预测步长(T+1..T+H)的平均误差
for h in range(1, H+1):
sub = cv[cv["h"]==h][["unique_id","ds","y","TimeGPT"]]; ev, y, yhat = _eval_df(sub)
# 记录该步长下的各类指标
rows.append({"model":model_name,"unique_id":UID,"horizon":h,"ME":me(y,yhat),"MPE":mpe(y,yhat),
"MAE":ev["mae"],"MSE":ev["mse"],"RMSE":ev["rmse"],"MAPE":ev["mape"],"sMAPE":ev["smape"]})
# 汇总所有步长的指标表并返回
return pd.DataFrame(rows).sort_values("horizon").reset_index(drop=True)
In [7]:
# 6) 函数3:对比两种 TimeGPT 模型
def compare_models(df, model_list=MODELS, H_TEST=12, H=5, n_windows=10):
# Hold-out
hold_rows = [holdout_metrics(df, m, H_TEST) for m in model_list]
holdout_df = pd.DataFrame(hold_rows).sort_values("MAE").round(3)
# Rolling
roll_dfs = [rolling_metrics(df, m, H, n_windows) for m in model_list]
rolling_df = pd.concat(roll_dfs, ignore_index=True).round(3)
return holdout_df, rolling_df
对比结果¶
In [8]:
# 7) 得到 Hold-out 与 Rolling 的对比结果
holdout_cmp, rolling_cmp = compare_models(df, MODELS, H_TEST=12, H=5, n_windows=10)
display(holdout_cmp) # 各模型在最后12期的总体指标
display(rolling_cmp) # 各模型在 h=1..5 的平均指标
INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Querying model metadata... INFO:nixtla.nixtla_client:Restricting input... INFO:nixtla.nixtla_client:Calling Forecast Endpoint... INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Querying model metadata... INFO:nixtla.nixtla_client:Restricting input... INFO:nixtla.nixtla_client:Calling Forecast Endpoint... INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Restricting input... INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint... INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Restricting input... INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
| model | unique_id | ME | MPE | MAE | MSE | RMSE | MAPE | sMAPE | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | timegpt-1-long-horizon | D1 | 3.015 | 0.518 | 11.062 | 199.132 | 14.111 | 0.023 | 0.012 |
| 0 | timegpt-1 | D1 | 4.803 | 0.790 | 12.679 | 213.936 | 14.627 | 0.027 | 0.014 |
| model | unique_id | horizon | ME | MPE | MAE | MSE | RMSE | MAPE | sMAPE | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | timegpt-1 | D1 | 1 | -6.199 | -1.627 | 10.340 | 152.831 | 12.362 | 0.026 | 0.013 |
| 1 | timegpt-1 | D1 | 2 | 0.795 | -0.072 | 13.483 | 241.000 | 15.524 | 0.033 | 0.016 |
| 2 | timegpt-1 | D1 | 3 | -0.001 | -0.450 | 13.507 | 218.250 | 14.773 | 0.036 | 0.018 |
| 3 | timegpt-1 | D1 | 4 | 4.745 | 0.480 | 20.547 | 512.559 | 22.640 | 0.050 | 0.025 |
| 4 | timegpt-1 | D1 | 5 | 7.133 | 1.199 | 17.200 | 411.193 | 20.278 | 0.041 | 0.020 |
| 5 | timegpt-1-long-horizon | D1 | 1 | -3.530 | -0.910 | 14.061 | 236.974 | 15.394 | 0.035 | 0.017 |
| 6 | timegpt-1-long-horizon | D1 | 2 | 3.830 | 0.793 | 14.638 | 311.755 | 17.657 | 0.036 | 0.018 |
| 7 | timegpt-1-long-horizon | D1 | 3 | 2.965 | 0.337 | 16.404 | 363.039 | 19.054 | 0.043 | 0.022 |
| 8 | timegpt-1-long-horizon | D1 | 4 | 6.510 | 0.997 | 23.530 | 698.041 | 26.420 | 0.059 | 0.030 |
| 9 | timegpt-1-long-horizon | D1 | 5 | 9.945 | 1.997 | 19.138 | 555.677 | 23.573 | 0.044 | 0.023 |
In [9]:
# 按模型分组取平均
rolling_avg = (
rolling_cmp
.groupby("model")[["ME","MAE","MSE","RMSE","MPE","MAPE","sMAPE"]]
.mean()
.reset_index()
.round(3)
)
print("📈 Rolling Window 平均误差(T+1 ~ T+5)")
display(rolling_avg)
📈 Rolling Window 平均误差(T+1 ~ T+5)
| model | ME | MAE | MSE | RMSE | MPE | MAPE | sMAPE | |
|---|---|---|---|---|---|---|---|---|
| 0 | timegpt-1 | 1.295 | 15.015 | 307.167 | 17.115 | -0.094 | 0.037 | 0.018 |
| 1 | timegpt-1-long-horizon | 3.944 | 17.554 | 433.097 | 20.420 | 0.643 | 0.043 | 0.022 |