✈️ Air Passengers:AI 预测与滚动评估¶

目标

  • 使用 air_passengers_with_id.csv。
  • 统一用 from nixtla import NixtlaClient 做预测:比较 seasonal_naive / theta / ets / auto_arima。
  • 评估指标:ME / MAE / MSE / RMSE / MPE / MAPE / sMAPE。
  • Rolling Window:对 T+1 ~ T+5 的平均误差进行汇总对比。

导入与初始化¶

In [1]:
# 0) 导入与初始化
import numpy as np
import pandas as pd
from nixtla import NixtlaClient
from utilsforecast.losses import mae, mse, rmse, mape, smape
from utilsforecast.evaluation import evaluate

nixtla = NixtlaClient(api_key='YOUR-API-KEY')  # 初始化客户端
FREQ = "MS"  # 月度
MODELS = ["timegpt-1", "timegpt-1-long-horizon"]            # 模型

读取数据¶

In [2]:
# 1) 读取数据
df = pd.read_csv("data/air_passengers_with_id.csv") #输入你的真实地址
df["ds"] = pd.to_datetime(df["ds"]) #将ds列转换为日期时间格式
df = df.sort_values(["unique_id","ds"]).reset_index(drop=True) #按unique_id和ds列排序,并重置索引
UID = df["unique_id"].iloc[0] 

定义指标函数ME、MPE¶

In [3]:
# 2) 指标函数:ME / MPE(%)
def me(y_true, y_pred): #计算预测值与真实值的平均误差
    y_true = np.asarray(y_true, float); y_pred = np.asarray(y_pred, float)
    return float(np.mean(y_true - y_pred))

def mpe(y_true, y_pred): #计算预测值与真实值的百分比误差
    y_true = np.asarray(y_true, float); y_pred = np.asarray(y_pred, float)
    mask = y_true != 0
    return float(np.mean((y_true[mask] - y_pred[mask]) / y_true[mask]) * 100.0) if np.any(mask) else np.nan

评估函数¶

In [4]:
# 3) 公共小工具
_METRICS = [mae, mse, rmse, mape, smape]
def _eval_df(df_like):
    ev = evaluate(df=df_like, metrics=_METRICS, models=["TimeGPT"], 
                  time_col="ds", target_col="y").set_index("metric")["TimeGPT"]
    y, yhat = df_like["y"].values, df_like["TimeGPT"].values; return ev, y, yhat
In [ ]:
# 4) 函数1:单模型的 Hold-out 指标
def holdout_metrics(df, model_name, H_TEST=12):
    # 划分训练集和测试集
    train, test = df.iloc[:-H_TEST], df.iloc[-H_TEST:]
    # 使用 Nixtla 模型进行预测,并与真实测试数据按日期合并
    merged = nixtla.forecast(df=train, h=H_TEST, freq=FREQ, 
                             model=model_name).merge(test, on=["unique_id","ds"], how="left")
    # 计算误差指标并提取真实值与预测值
    ev, y, yhat = _eval_df(merged)
    # 汇总结果
    return {"model":model_name,"unique_id":UID,"ME":me(y,yhat),"MPE":mpe(y,yhat),
            "MAE":ev["mae"],"MSE":ev["mse"],"RMSE":ev["rmse"],"MAPE":ev["mape"],"sMAPE":ev["smape"]}
In [ ]:
# 5) 函数2:单模型的 Rolling(T+1..T+5) 平均指标
def rolling_metrics(df, model_name, H=5, n_windows=10):
    # 生成多次滚动窗口预测结果,并按时间排序
    cv = nixtla.cross_validation(df=df, h=H, n_windows=n_windows, freq=FREQ, 
                                 model=model_name).sort_values(["unique_id","cutoff","ds"]).reset_index(drop=True)
    # 给每个窗口的预测样本标上步长 h(第几步预测)
    cv["h"] = cv.groupby(["unique_id","cutoff"]).cumcount() + 1; rows = []
    # 循环计算每个预测步长(T+1..T+H)的平均误差
    for h in range(1, H+1):
        sub = cv[cv["h"]==h][["unique_id","ds","y","TimeGPT"]]; ev, y, yhat = _eval_df(sub)
        # 记录该步长下的各类指标
        rows.append({"model":model_name,"unique_id":UID,"horizon":h,"ME":me(y,yhat),"MPE":mpe(y,yhat),
                     "MAE":ev["mae"],"MSE":ev["mse"],"RMSE":ev["rmse"],"MAPE":ev["mape"],"sMAPE":ev["smape"]})
    # 汇总所有步长的指标表并返回
    return pd.DataFrame(rows).sort_values("horizon").reset_index(drop=True)
In [7]:
# 6) 函数3:对比两种 TimeGPT 模型
def compare_models(df, model_list=MODELS, H_TEST=12, H=5, n_windows=10):
    # Hold-out
    hold_rows = [holdout_metrics(df, m, H_TEST) for m in model_list]
    holdout_df = pd.DataFrame(hold_rows).sort_values("MAE").round(3)
    # Rolling
    roll_dfs = [rolling_metrics(df, m, H, n_windows) for m in model_list]
    rolling_df = pd.concat(roll_dfs, ignore_index=True).round(3)
    return holdout_df, rolling_df

对比结果¶

In [8]:
# 7) 得到 Hold-out 与 Rolling 的对比结果
holdout_cmp, rolling_cmp = compare_models(df, MODELS, H_TEST=12, H=5, n_windows=10)
display(holdout_cmp)   # 各模型在最后12期的总体指标
display(rolling_cmp)   # 各模型在 h=1..5 的平均指标
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
model unique_id ME MPE MAE MSE RMSE MAPE sMAPE
1 timegpt-1-long-horizon D1 3.015 0.518 11.062 199.132 14.111 0.023 0.012
0 timegpt-1 D1 4.803 0.790 12.679 213.936 14.627 0.027 0.014
model unique_id horizon ME MPE MAE MSE RMSE MAPE sMAPE
0 timegpt-1 D1 1 -6.199 -1.627 10.340 152.831 12.362 0.026 0.013
1 timegpt-1 D1 2 0.795 -0.072 13.483 241.000 15.524 0.033 0.016
2 timegpt-1 D1 3 -0.001 -0.450 13.507 218.250 14.773 0.036 0.018
3 timegpt-1 D1 4 4.745 0.480 20.547 512.559 22.640 0.050 0.025
4 timegpt-1 D1 5 7.133 1.199 17.200 411.193 20.278 0.041 0.020
5 timegpt-1-long-horizon D1 1 -3.530 -0.910 14.061 236.974 15.394 0.035 0.017
6 timegpt-1-long-horizon D1 2 3.830 0.793 14.638 311.755 17.657 0.036 0.018
7 timegpt-1-long-horizon D1 3 2.965 0.337 16.404 363.039 19.054 0.043 0.022
8 timegpt-1-long-horizon D1 4 6.510 0.997 23.530 698.041 26.420 0.059 0.030
9 timegpt-1-long-horizon D1 5 9.945 1.997 19.138 555.677 23.573 0.044 0.023
In [9]:
# 按模型分组取平均
rolling_avg = (
    rolling_cmp
    .groupby("model")[["ME","MAE","MSE","RMSE","MPE","MAPE","sMAPE"]]
    .mean()
    .reset_index()
    .round(3)
)
print("📈 Rolling Window 平均误差(T+1 ~ T+5)")
display(rolling_avg)
📈 Rolling Window 平均误差(T+1 ~ T+5)
model ME MAE MSE RMSE MPE MAPE sMAPE
0 timegpt-1 1.295 15.015 307.167 17.115 -0.094 0.037 0.018
1 timegpt-1-long-horizon 3.944 17.554 433.097 20.420 0.643 0.043 0.022