Skip to content

AI/ML 生态全景

Python 是 AI/ML 领域的第一语言。这里梳理从基础到应用的完整生态。

生态地图

Python AI/ML 生态

├── 基础计算层
│   ├── NumPy          — 数值计算基础
│   ├── SciPy          — 科学计算
│   └── CUDA (via PyTorch/JAX)

├── 机器学习
│   ├── Scikit-learn   — 传统 ML 算法
│   ├── XGBoost        — 梯度提升
│   └── LightGBM       — 高效梯度提升

├── 深度学习框架
│   ├── PyTorch        — 研究首选,动态图
│   ├── TensorFlow/Keras — 生产部署
│   └── JAX            — 函数式,Google 研究

├── NLP & LLM
│   ├── HuggingFace Transformers — 预训练模型
│   ├── Tokenizers     — 高效分词
│   ├── Datasets       — 数据集管理
│   └── PEFT           — 参数高效微调

├── LLM 应用框架
│   ├── LangChain      — LLM 应用编排
│   ├── LlamaIndex     — RAG 框架
│   ├── OpenAI SDK     — GPT API
│   └── Anthropic SDK  — Claude API

└── MLOps
    ├── MLflow         — 实验追踪
    ├── Weights & Biases — 可视化追踪
    └── BentoML        — 模型服务化

学习路径建议

入门:Scikit-learn

python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# 加载数据
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Pipeline:预处理 + 模型
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier(n_estimators=100))
])

pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
print(classification_report(y_test, y_pred))

进阶:PyTorch

python
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

# 定义模型
class MLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        return self.net(x)

# 训练循环
model = MLP(4, 64, 3)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

for epoch in range(100):
    for X_batch, y_batch in dataloader:
        optimizer.zero_grad()
        logits = model(X_batch)
        loss = criterion(logits, y_batch)
        loss.backward()
        optimizer.step()

应用:LLM 开发

python
from openai import OpenAI

client = OpenAI()

# 基础对话
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "你是一个 Python 编程助手"},
        {"role": "user", "content": "解释什么是装饰器"}
    ]
)
print(response.choices[0].message.content)

# 流式输出
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "写一个快速排序"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

各模块详细文档

本站内容由 褚成志 整理编写,仅供学习参考