LLMs based Forecasting(via DeepSeek API)¶
Feng Li¶
Guanghua School of Management¶
Peking University¶
feng.li@gsm.pku.edu.cn¶
Course home page: https://feng.li/forecasting-with-ai¶
Let’s forecast¶
我们在这里使用 DeepSeek API 来完成“用历史乘客(passenger)数据预测未来几期”的任务。
In [5]:
! pip install openai --break-system-packages
Defaulting to user installation because normal site-packages is not writeable WARNING: Skipping /usr/lib/python3.12/dist-packages/charset_normalizer-3.3.2.dist-info due to invalid metadata entry 'name' Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: openai in /home/fli/.local/lib/python3.12/site-packages (1.107.2) Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.12/dist-packages (from openai) (4.5.0) Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai) (1.9.0) Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from openai) (0.27.2) Requirement already satisfied: jiter<1,>=0.4.0 in /home/fli/.local/lib/python3.12/site-packages (from openai) (0.10.0) Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.12/dist-packages (from openai) (2.9.2) Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from openai) (1.3.1) Requirement already satisfied: tqdm>4 in /home/fli/.local/lib/python3.12/site-packages (from openai) (4.67.1) Requirement already satisfied: typing-extensions<5,>=4.11 in /home/fli/.local/lib/python3.12/site-packages (from openai) (4.15.0) Requirement already satisfied: idna>=2.8 in /usr/lib/python3/dist-packages (from anyio<5,>=3.5.0->openai) (3.6) Requirement already satisfied: certifi in /usr/lib/python3/dist-packages (from httpx<1,>=0.23.0->openai) (2023.11.17) Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.23.0->openai) (1.0.5) Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.14.0) Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3,>=1.9.0->openai) (0.7.0) Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.12/dist-packages (from pydantic<3,>=1.9.0->openai) (2.23.4) WARNING: Skipping /usr/lib/python3.12/dist-packages/charset_normalizer-3.3.2.dist-info due to invalid metadata entry 'name'
Load the DeepSeek API¶
如何申请 DeepSeek API Key¶
- 登录 DeepSeek 平台的 API Keys 页面 (https://platform.deepseek.com/api_keys) 创建 Key 。
- 点击创建 API KEY
- 输入 API KEY 的名称
- 复制 API KEY。⚠️请注意,一定要立刻复制
Use the DeepSeek API¶
In [6]:
import json, re, pandas as pd
from openai import OpenAI
In [ ]:
DEEPSEEK_API_KEY = "YOUR_DEEPSEEK_API_KEY" # ← 替换为你的真实 Key
client = OpenAI(api_key=DEEPSEEK_API_KEY, base_url="https://api.deepseek.com/v1")
读取 passenger 数据¶
要求列:unique_id
, ds
, y
In [7]:
data_path = "../data/air_passengers_with_id.csv"
df = pd.read_csv(data_path)
df.columns = [c.lower() for c in df.columns]
df["ds"] = pd.to_datetime(df["ds"], errors="raise") # 把时间列转化成 Pandas 的时间格式
df = df.sort_values(["unique_id","ds"]).reset_index(drop=True)
uid = df["unique_id"].iloc[0]
df
Out[7]:
unique_id | ds | y | |
---|---|---|---|
0 | D1 | 1949-01-01 | 112 |
1 | D1 | 1949-02-01 | 118 |
2 | D1 | 1949-03-01 | 132 |
3 | D1 | 1949-04-01 | 129 |
4 | D1 | 1949-05-01 | 121 |
... | ... | ... | ... |
139 | D1 | 1960-08-01 | 606 |
140 | D1 | 1960-09-01 | 508 |
141 | D1 | 1960-10-01 | 461 |
142 | D1 | 1960-11-01 | 390 |
143 | D1 | 1960-12-01 | 432 |
144 rows × 3 columns
可视化原始时间序列¶
In [5]:
df_plot = df[df['unique_id']==uid].sort_values('ds').set_index('ds')
df_plot['y'].plot(title=f'{uid} — history', figsize=(10,4), legend=True)
Out[5]:
<Axes: title={'center': 'D1 — history'}, xlabel='ds'>
DeepSeek 预测¶
我们通过提示词约定 严格 JSON 输出,并解析为 DataFrame。
In [8]:
def build_prompt(df, h=12, freq='MS', tail=120):
sub = df[df['unique_id']==uid].sort_values('ds').tail(tail)
hist = [{'ds': r.ds.strftime('%Y-%m-%d'), 'y': float(r.y)} for r in sub.itertuples(index=False)]
return (
f'你是一名时间序列预测助手。下面给出乘客序列的历史数据(频率 {freq})。\n'
f'请预测未来 {h} 期,并**只输出严格 JSON**。\n\n'
f'【历史数据】\n{json.dumps(hist, ensure_ascii=False, indent=2)}\n\n'
'{\n'
f' "h": {h},\n'
f' "freq": "{freq}",\n'
' "forecast": [ { "ds": "YYYY-MM-DD", "yhat": <float> } ]\n'
'}'
)
In [9]:
prompt = build_prompt(df, h=12, freq='MS')
prompt
Out[9]:
'你是一名时间序列预测助手。下面给出乘客序列的历史数据(频率 MS)。\n请预测未来 12 期,并**只输出严格 JSON**。\n\n【历史数据】\n[\n {\n "ds": "1951-01-01",\n "y": 145.0\n },\n {\n "ds": "1951-02-01",\n "y": 150.0\n },\n {\n "ds": "1951-03-01",\n "y": 178.0\n },\n {\n "ds": "1951-04-01",\n "y": 163.0\n },\n {\n "ds": "1951-05-01",\n "y": 172.0\n },\n {\n "ds": "1951-06-01",\n "y": 178.0\n },\n {\n "ds": "1951-07-01",\n "y": 199.0\n },\n {\n "ds": "1951-08-01",\n "y": 199.0\n },\n {\n "ds": "1951-09-01",\n "y": 184.0\n },\n {\n "ds": "1951-10-01",\n "y": 162.0\n },\n {\n "ds": "1951-11-01",\n "y": 146.0\n },\n {\n "ds": "1951-12-01",\n "y": 166.0\n },\n {\n "ds": "1952-01-01",\n "y": 171.0\n },\n {\n "ds": "1952-02-01",\n "y": 180.0\n },\n {\n "ds": "1952-03-01",\n "y": 193.0\n },\n {\n "ds": "1952-04-01",\n "y": 181.0\n },\n {\n "ds": "1952-05-01",\n "y": 183.0\n },\n {\n "ds": "1952-06-01",\n "y": 218.0\n },\n {\n "ds": "1952-07-01",\n "y": 230.0\n },\n {\n "ds": "1952-08-01",\n "y": 242.0\n },\n {\n "ds": "1952-09-01",\n "y": 209.0\n },\n {\n "ds": "1952-10-01",\n "y": 191.0\n },\n {\n "ds": "1952-11-01",\n "y": 172.0\n },\n {\n "ds": "1952-12-01",\n "y": 194.0\n },\n {\n "ds": "1953-01-01",\n "y": 196.0\n },\n {\n "ds": "1953-02-01",\n "y": 196.0\n },\n {\n "ds": "1953-03-01",\n "y": 236.0\n },\n {\n "ds": "1953-04-01",\n "y": 235.0\n },\n {\n "ds": "1953-05-01",\n "y": 229.0\n },\n {\n "ds": "1953-06-01",\n "y": 243.0\n },\n {\n "ds": "1953-07-01",\n "y": 264.0\n },\n {\n "ds": "1953-08-01",\n "y": 272.0\n },\n {\n "ds": "1953-09-01",\n "y": 237.0\n },\n {\n "ds": "1953-10-01",\n "y": 211.0\n },\n {\n "ds": "1953-11-01",\n "y": 180.0\n },\n {\n "ds": "1953-12-01",\n "y": 201.0\n },\n {\n "ds": "1954-01-01",\n "y": 204.0\n },\n {\n "ds": "1954-02-01",\n "y": 188.0\n },\n {\n "ds": "1954-03-01",\n "y": 235.0\n },\n {\n "ds": "1954-04-01",\n "y": 227.0\n },\n {\n "ds": "1954-05-01",\n "y": 234.0\n },\n {\n "ds": "1954-06-01",\n "y": 264.0\n },\n {\n "ds": "1954-07-01",\n "y": 302.0\n },\n {\n "ds": "1954-08-01",\n "y": 293.0\n },\n {\n "ds": "1954-09-01",\n "y": 259.0\n },\n {\n "ds": "1954-10-01",\n "y": 229.0\n },\n {\n "ds": "1954-11-01",\n "y": 203.0\n },\n {\n "ds": "1954-12-01",\n "y": 229.0\n },\n {\n "ds": "1955-01-01",\n "y": 242.0\n },\n {\n "ds": "1955-02-01",\n "y": 233.0\n },\n {\n "ds": "1955-03-01",\n "y": 267.0\n },\n {\n "ds": "1955-04-01",\n "y": 269.0\n },\n {\n "ds": "1955-05-01",\n "y": 270.0\n },\n {\n "ds": "1955-06-01",\n "y": 315.0\n },\n {\n "ds": "1955-07-01",\n "y": 364.0\n },\n {\n "ds": "1955-08-01",\n "y": 347.0\n },\n {\n "ds": "1955-09-01",\n "y": 312.0\n },\n {\n "ds": "1955-10-01",\n "y": 274.0\n },\n {\n "ds": "1955-11-01",\n "y": 237.0\n },\n {\n "ds": "1955-12-01",\n "y": 278.0\n },\n {\n "ds": "1956-01-01",\n "y": 284.0\n },\n {\n "ds": "1956-02-01",\n "y": 277.0\n },\n {\n "ds": "1956-03-01",\n "y": 317.0\n },\n {\n "ds": "1956-04-01",\n "y": 313.0\n },\n {\n "ds": "1956-05-01",\n "y": 318.0\n },\n {\n "ds": "1956-06-01",\n "y": 374.0\n },\n {\n "ds": "1956-07-01",\n "y": 413.0\n },\n {\n "ds": "1956-08-01",\n "y": 405.0\n },\n {\n "ds": "1956-09-01",\n "y": 355.0\n },\n {\n "ds": "1956-10-01",\n "y": 306.0\n },\n {\n "ds": "1956-11-01",\n "y": 271.0\n },\n {\n "ds": "1956-12-01",\n "y": 306.0\n },\n {\n "ds": "1957-01-01",\n "y": 315.0\n },\n {\n "ds": "1957-02-01",\n "y": 301.0\n },\n {\n "ds": "1957-03-01",\n "y": 356.0\n },\n {\n "ds": "1957-04-01",\n "y": 348.0\n },\n {\n "ds": "1957-05-01",\n "y": 355.0\n },\n {\n "ds": "1957-06-01",\n "y": 422.0\n },\n {\n "ds": "1957-07-01",\n "y": 465.0\n },\n {\n "ds": "1957-08-01",\n "y": 467.0\n },\n {\n "ds": "1957-09-01",\n "y": 404.0\n },\n {\n "ds": "1957-10-01",\n "y": 347.0\n },\n {\n "ds": "1957-11-01",\n "y": 305.0\n },\n {\n "ds": "1957-12-01",\n "y": 336.0\n },\n {\n "ds": "1958-01-01",\n "y": 340.0\n },\n {\n "ds": "1958-02-01",\n "y": 318.0\n },\n {\n "ds": "1958-03-01",\n "y": 362.0\n },\n {\n "ds": "1958-04-01",\n "y": 348.0\n },\n {\n "ds": "1958-05-01",\n "y": 363.0\n },\n {\n "ds": "1958-06-01",\n "y": 435.0\n },\n {\n "ds": "1958-07-01",\n "y": 491.0\n },\n {\n "ds": "1958-08-01",\n "y": 505.0\n },\n {\n "ds": "1958-09-01",\n "y": 404.0\n },\n {\n "ds": "1958-10-01",\n "y": 359.0\n },\n {\n "ds": "1958-11-01",\n "y": 310.0\n },\n {\n "ds": "1958-12-01",\n "y": 337.0\n },\n {\n "ds": "1959-01-01",\n "y": 360.0\n },\n {\n "ds": "1959-02-01",\n "y": 342.0\n },\n {\n "ds": "1959-03-01",\n "y": 406.0\n },\n {\n "ds": "1959-04-01",\n "y": 396.0\n },\n {\n "ds": "1959-05-01",\n "y": 420.0\n },\n {\n "ds": "1959-06-01",\n "y": 472.0\n },\n {\n "ds": "1959-07-01",\n "y": 548.0\n },\n {\n "ds": "1959-08-01",\n "y": 559.0\n },\n {\n "ds": "1959-09-01",\n "y": 463.0\n },\n {\n "ds": "1959-10-01",\n "y": 407.0\n },\n {\n "ds": "1959-11-01",\n "y": 362.0\n },\n {\n "ds": "1959-12-01",\n "y": 405.0\n },\n {\n "ds": "1960-01-01",\n "y": 417.0\n },\n {\n "ds": "1960-02-01",\n "y": 391.0\n },\n {\n "ds": "1960-03-01",\n "y": 419.0\n },\n {\n "ds": "1960-04-01",\n "y": 461.0\n },\n {\n "ds": "1960-05-01",\n "y": 472.0\n },\n {\n "ds": "1960-06-01",\n "y": 535.0\n },\n {\n "ds": "1960-07-01",\n "y": 622.0\n },\n {\n "ds": "1960-08-01",\n "y": 606.0\n },\n {\n "ds": "1960-09-01",\n "y": 508.0\n },\n {\n "ds": "1960-10-01",\n "y": 461.0\n },\n {\n "ds": "1960-11-01",\n "y": 390.0\n },\n {\n "ds": "1960-12-01",\n "y": 432.0\n }\n]\n\n{\n "h": 12,\n "freq": "MS",\n "forecast": [ { "ds": "YYYY-MM-DD", "yhat": <float> } ]\n}'
In [7]:
# 我们写一个函数解析一下JSON的输出
def parse_json(text):
try: return json.loads(text)
except: pass
if '```' in text:
for part in text.split('```'):
p = part.strip()
if p.lower().startswith('json'): p = p[4:].strip()
try: return json.loads(p)
except: pass
m = re.search(r'\{[\s\S]*\}', text)
if m: return json.loads(m.group(0))
raise ValueError('无法解析JSON: '+text[:200])
6期预测(short horizon)¶
In [10]:
prompt = build_prompt(df, h=6, freq='MS')
resp = client.chat.completions.create(model='deepseek-chat', messages=[{'role':'user','content':prompt}], temperature=0)
raw = resp.choices[0].message.content
obj = parse_json(raw)
fc6 = pd.DataFrame(obj['forecast']).assign(unique_id=uid)
fc6['ds'] = pd.to_datetime(fc6['ds'])
fc6['yhat'] = pd.to_numeric(fc6['yhat'], errors='coerce')
fc6 = fc6.dropna().sort_values('ds').reset_index(drop=True)
fc6.head()
Out[10]:
ds | yhat | unique_id | |
---|---|---|---|
0 | 1961-01-01 | 445.2 | D1 |
1 | 1961-02-01 | 420.1 | D1 |
2 | 1961-03-01 | 488.3 | D1 |
3 | 1961-04-01 | 475.6 | D1 |
4 | 1961-05-01 | 496.8 | D1 |
In [11]:
hist = df[df['unique_id']==uid].sort_values('ds').set_index('ds')
fc_plot = fc6.set_index('ds')
ax = hist['y'].plot(label='history', figsize=(10,4))
fc_plot['yhat'].plot(ax=ax, label='forecast')
ax.set_title(f"{uid} — history & 6-step forecast")
ax.legend()
Out[11]:
<matplotlib.legend.Legend at 0x10ae8f750>
12 期预测¶
In [8]:
prompt = build_prompt(df, h=12, freq='MS')
resp = client.chat.completions.create(model='deepseek-chat', messages=[{'role':'user','content':prompt}], temperature=0)
raw = resp.choices[0].message.content
obj = parse_json(raw)
fc12 = pd.DataFrame(obj['forecast']).assign(unique_id=uid)
fc12['ds'] = pd.to_datetime(fc12['ds'])
fc12['yhat'] = pd.to_numeric(fc12['yhat'], errors='coerce')
fc12 = fc12.dropna().sort_values('ds').reset_index(drop=True)
fc12.head()
Out[8]:
ds | yhat | unique_id | |
---|---|---|---|
0 | 1961-01-01 | 444.2 | D1 |
1 | 1961-02-01 | 422.1 | D1 |
2 | 1961-03-01 | 456.8 | D1 |
3 | 1961-04-01 | 478.3 | D1 |
4 | 1961-05-01 | 492.6 | D1 |
In [9]:
hist = df[df['unique_id']==uid].sort_values('ds').set_index('ds')
fc_plot = fc12.set_index('ds')
ax = hist['y'].plot(label='history', figsize=(10,4))
fc_plot['yhat'].plot(ax=ax, label='forecast')
ax.set_title(f"{uid} — history & 12-step forecast")
ax.legend()
Out[9]:
<matplotlib.legend.Legend at 0x10ae8da90>
36期预测(long horizon)¶
In [12]:
prompt = build_prompt(df, h=36, freq='MS')
resp = client.chat.completions.create(model='deepseek-chat', messages=[{'role':'user','content':prompt}], temperature=0)
raw = resp.choices[0].message.content
obj = parse_json(raw)
fc36 = pd.DataFrame(obj['forecast']).assign(unique_id=uid)
fc36['ds'] = pd.to_datetime(fc36['ds'])
fc36['yhat'] = pd.to_numeric(fc36['yhat'], errors='coerce')
fc36 = fc36.dropna().sort_values('ds').reset_index(drop=True)
fc36.head()
Out[12]:
ds | yhat | unique_id | |
---|---|---|---|
0 | 1961-01-01 | 445.2 | D1 |
1 | 1961-02-01 | 420.1 | D1 |
2 | 1961-03-01 | 461.8 | D1 |
3 | 1961-04-01 | 475.3 | D1 |
4 | 1961-05-01 | 489.6 | D1 |
In [13]:
hist = df[df['unique_id']==uid].sort_values('ds').set_index('ds')
fc_plot = fc36.set_index('ds')
ax = hist['y'].plot(label='history', figsize=(10,4))
fc_plot['yhat'].plot(ax=ax, label='forecast')
ax.set_title(f"{uid} — history & 36-step forecast")
ax.legend()
Out[13]:
<matplotlib.legend.Legend at 0x10af80a50>
Lab¶
- 请找一个你自己的数据,调整成
air_passengers_with_id.csv
的格式。 - 使用我们上面的程序做一个预测,利用你的行业知识评估一下一下这个预测是否是一个可靠的预测。