AI/ML 生态全景
Python 是 AI/ML 领域的第一语言。这里梳理从基础到应用的完整生态。
生态地图
Python AI/ML 生态
│
├── 基础计算层
│ ├── NumPy — 数值计算基础
│ ├── SciPy — 科学计算
│ └── CUDA (via PyTorch/JAX)
│
├── 机器学习
│ ├── Scikit-learn — 传统 ML 算法
│ ├── XGBoost — 梯度提升
│ └── LightGBM — 高效梯度提升
│
├── 深度学习框架
│ ├── PyTorch — 研究首选,动态图
│ ├── TensorFlow/Keras — 生产部署
│ └── JAX — 函数式,Google 研究
│
├── NLP & LLM
│ ├── HuggingFace Transformers — 预训练模型
│ ├── Tokenizers — 高效分词
│ ├── Datasets — 数据集管理
│ └── PEFT — 参数高效微调
│
├── LLM 应用框架
│ ├── LangChain — LLM 应用编排
│ ├── LlamaIndex — RAG 框架
│ ├── OpenAI SDK — GPT API
│ └── Anthropic SDK — Claude API
│
└── MLOps
├── MLflow — 实验追踪
├── Weights & Biases — 可视化追踪
└── BentoML — 模型服务化学习路径建议
入门:Scikit-learn
python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# 加载数据
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Pipeline:预处理 + 模型
pipe = Pipeline([
('scaler', StandardScaler()),
('clf', RandomForestClassifier(n_estimators=100))
])
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
print(classification_report(y_test, y_pred))进阶:PyTorch
python
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
# 定义模型
class MLP(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(hidden_dim, output_dim)
)
def forward(self, x):
return self.net(x)
# 训练循环
model = MLP(4, 64, 3)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
for epoch in range(100):
for X_batch, y_batch in dataloader:
optimizer.zero_grad()
logits = model(X_batch)
loss = criterion(logits, y_batch)
loss.backward()
optimizer.step()应用:LLM 开发
python
from openai import OpenAI
client = OpenAI()
# 基础对话
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "你是一个 Python 编程助手"},
{"role": "user", "content": "解释什么是装饰器"}
]
)
print(response.choices[0].message.content)
# 流式输出
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "写一个快速排序"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)