Forecasting with Transformer¶
Feng Li¶
Guanghua School of Management¶
Peking University¶
feng.li@gsm.pku.edu.cn¶
Course home page: https://feng.li/forecasting-with-ai¶
pip install torch torchvision --break-system-packages
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: torch in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (2.7.1) Requirement already satisfied: torchvision in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (0.22.1) Requirement already satisfied: filelock in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (3.16.1) Requirement already satisfied: typing-extensions>=4.10.0 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (4.12.2) Requirement already satisfied: setuptools in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (75.6.0) Requirement already satisfied: sympy>=1.13.3 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (1.14.0) Requirement already satisfied: networkx in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (3.4.2) Requirement already satisfied: jinja2 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (3.1.4) Requirement already satisfied: fsspec in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (2024.12.0) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (12.6.77) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (12.6.77) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (12.6.80) Requirement already satisfied: nvidia-cudnn-cu12==9.5.1.17 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (9.5.1.17) Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (12.6.4.1) Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (11.3.0.4) Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (10.3.7.77) Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (11.7.1.2) Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (12.5.4.2) Requirement already satisfied: nvidia-cusparselt-cu12==0.6.3 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (0.6.3) Requirement already satisfied: nvidia-nccl-cu12==2.26.2 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (2.26.2) Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (12.6.77) Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (12.6.85) Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (1.11.1.6) Requirement already satisfied: triton==3.3.1 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torch) (3.3.1) Requirement already satisfied: numpy in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torchvision) (1.26.0) Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from torchvision) (10.2.0) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from sympy>=1.13.3->torch) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages (from jinja2->torch) (3.0.2) [notice] A new release of pip is available: 24.2 -> 25.3 [notice] To update, run: python -m pip install --upgrade pip Note: you may need to restart the kernel to use updated packages.
生成时间序列数据¶
- 正弦波提供周期性规律(如季节性)
- 噪声代表现实中的随机波动(如市场波动)
- 这相当于我们要预测的“历史销售量”或“股票价格”
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
# 1️⃣ Generate synthetic data
T = 500
t = np.arange(0, T)
x = np.sin(0.02 * t) + 0.1 * np.random.randn(T)
plt.plot(t, x)
plt.title("Synthetic Time Series")
plt.show()
转换成监督学习数据¶
我们把时间序列转成“特征—目标”形式。
每一段长度为 L=20 的历史数据是输入(X);
之后的一个点是输出(Y)。
这就是把时间序列问题转化为机器学习能理解的监督学习任务。
# 2️⃣ Create supervised learning dataset
def create_dataset(series, L=20):
X, Y = [], []
for i in range(len(series) - L):
X.append(series[i:i+L])
Y.append(series[i+L])
return np.array(X), np.array(Y)
L = 20
X, Y = create_dataset(x, L)
X = torch.tensor(X).float().unsqueeze(-1) # (N, L, 1)
Y = torch.tensor(Y).float().unsqueeze(-1) # (N, 1)
划分训练集与测试集¶
我们用前 80% 的数据训练模型(让它“学习规律”),
用后 20% 的数据做测试(检验它是否能“预测未来”)。
# 3️⃣ Split train/test
train_size = int(0.8 * len(X))
X_train, X_test = X[:train_size], X[train_size:]
Y_train, Y_test = Y[:train_size], Y[train_size:]
print(f"Train size: {len(X_train)}, Test size: {len(X_test)}")
Train size: 384, Test size: 96
定义 Transformer 模型¶
结构分三部分:
输入投影 (input_proj):把标量数据(单个数)映射成“向量表示”;
Transformer Encoder:捕捉序列中不同时间点之间的依赖关系;
解码层 (decoder):输出未来一步的预测。
# 4️⃣ Define Transformer model
class TimeSeriesTransformer(nn.Module):
def __init__(self, input_size=1, d_model=64, nhead=4, num_layers=2):
super().__init__()
self.input_proj = nn.Linear(input_size, d_model)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model, nhead=nhead, dim_feedforward=128, dropout=0.1
)
self.encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
self.decoder = nn.Linear(d_model, 1)
def forward(self, src):
src = self.input_proj(src) # (batch, L, d_model)
src = src.permute(1, 0, 2) # (L, batch, d_model)
memory = self.encoder(src) # (L, batch, d_model)
out = self.decoder(memory[-1]) # last token
return out
d_model = 64:内部表示的维度 (embedding dimension)¶
- Transformer 不直接处理原始数字,而是把每个输入映射到一个 64维向量空间;
- 这叫“特征嵌入(embedding)”,能让模型捕捉更复杂的模式;
- 越高维,模型表达能力越强,但计算量也更大。
🧠 通俗理解:
把一个数字转成 64 维的“语义向量”,模型在这个空间里找规律。 就像把“销售额”这个数转成一份更丰富的“销售特征报告”——里面包含周期性、波动趋势等 64 种特征维度。
nhead = 4:注意力头(multi-head attention)数量¶
Transformer 的核心是“多头注意力机制”;
每个 head 关注时间序列的不同方面,例如:
- head 1 关注短期波动;
- head 2 关注长期趋势;
- head 3 关注异常跳动;
- head 4 关注周期性。
多头机制能让模型并行地捕捉多种依赖关系。
🧠 通俗理解:
模型有 4 双“眼睛”,同时看时间序列的不同特征。 就像企业的 4 位分析师:一个看月度趋势、一个看季节波动、一个看假日效应、一个看竞争对手动态。
num_layers = 2:Transformer 编码层的层数¶
- 每层都包含注意力机制和前馈网络;
- 层数越多,模型越“深”,可以捕捉更复杂的模式;
- 对于小数据(如示例),2 层就足够。
🧠 通俗理解:
模型有两层“思考单元”,先学到简单模式,再逐层抽象出更高层规律。 就像公司报告流程:一层是部门分析(第一层),另一层是高管总结(第二层),信息经过两轮加工变得更有洞察力。
Q (Query), K (Key), V (Value) 在哪里?¶
在我们的这段代码中(TimeSeriesTransformer),虽然我们没有手动写出 Q、K、V 的矩阵,但它们其实自动在 PyTorch 的 TransformerEncoderLayer 内部实现了。
- 我们的
src(输入序列特征)会被线性投影到 Q、K、V 三个空间; MultiheadAttention会并行计算多个头(我们设了nhead=4);- 每个头独立生成自己的注意力权重;
- 然后所有头的结果会拼接起来,形成最终的注意力输出。
定义损失函数¶
创建一个 Transformer 模型对象;
- 模型包含我们之前定义的结构(输入层 → 注意力层 → 解码层);
- 这个模型能从历史时间序列中“学习规律”,输出未来的预测。
定义损失函数(loss function),即模型预测误差的衡量方式;
MSELoss表示 均方误差(Mean Squared Error):- 损失越小,说明预测越准确。
定义优化器(optimizer),控制模型如何学习、如何更新参数;
Adam是一种自适应学习算法,能自动调整每个参数的学习步长;lr=1e-3表示学习率(learning rate)= 0.001,控制更新速度。
model = TimeSeriesTransformer()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
/home/fli/.virtualenvs/python3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:382: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) warnings.warn(
# 5️⃣ Training loop
for epoch in range(100):
model.train()
optimizer.zero_grad()
output = model(X_train)
loss = criterion(output, Y_train)
loss.backward()
optimizer.step()
if (epoch+1) % 20 == 0:
model.eval()
with torch.no_grad():
val_pred = model(X_test)
val_loss = criterion(val_pred, Y_test)
print(f"Epoch {epoch+1}: train loss={loss.item():.4f}, test loss={val_loss.item():.4f}")
Epoch 20: train loss=0.0328, test loss=0.0415 Epoch 40: train loss=0.0274, test loss=0.0225 Epoch 60: train loss=0.0220, test loss=0.0249 Epoch 80: train loss=0.0231, test loss=0.0236 Epoch 100: train loss=0.0214, test loss=0.0230
在测试集上预测¶
- 我们关闭训练(no_grad)
- 在没见过的数据上预测
- 模型用测试集输入(历史窗口),输出对应的下一步预测值
# 6️⃣ One-step ahead predictions on test set
model.eval()
with torch.no_grad():
preds_test = model(X_test).squeeze().numpy()
可视化结果¶
蓝线:真实时间序列;
红线:测试期预测;
灰线:训练/测试分界线。
# 7️⃣ Plot true vs predicted on test portion
plt.figure(figsize=(10,5))
plt.plot(range(len(x)), x, label="True Series", alpha=0.6)
plt.plot(range(train_size+L, T), preds_test, label="Predicted (test)", color="red")
plt.axvline(train_size+L, color="gray", linestyle="--", label="Train/Test split")
plt.legend()
plt.title("Transformer One-Step Forecasting")
plt.show()