Hugging Face 生态
问题
Hugging Face 的 Transformers 库核心功能是什么?如何做模型推理和微调?
答案
Pipeline(快速推理)
from transformers import pipeline
# 文本分类
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("I love this movie!")
# [{'label': 'POSITIVE', 'score': 0.9998}]
# 文本生成
generator = pipeline("text-generation", model="gpt2")
output = generator("Python is", max_length=50)
# 问答
qa = pipeline("question-answering")
result = qa(question="What is Python?", context="Python is a programming language.")
Tokenizer + Model(精细控制)
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-chinese", num_labels=3)
# 编码
inputs = tokenizer("Python 是最好的编程语言", return_tensors="pt", padding=True, truncation=True)
# 推理
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted = torch.argmax(probs, dim=-1)
微调(Fine-tuning)
from transformers import Trainer, TrainingArguments
from datasets import load_dataset
# 加载数据集
dataset = load_dataset("imdb")
# 数据预处理
def tokenize(examples):
return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)
tokenized = dataset.map(tokenize, batched=True)
# 训练配置
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=32,
learning_rate=2e-5,
weight_decay=0.01,
eval_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
fp16=True, # 混合精度
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized["train"],
eval_dataset=tokenized["test"],
)
trainer.train()
Datasets 库
from datasets import load_dataset, Dataset
# 加载 Hugging Face Hub 数据集
ds = load_dataset("squad", split="train[:1000]")
# 从本地数据创建
ds = Dataset.from_pandas(df)
ds = Dataset.from_csv("data.csv")
# 数据处理(惰性 map,支持多进程)
ds = ds.map(preprocess, batched=True, num_proc=4)
ds = ds.filter(lambda x: len(x["text"]) > 100)
常见面试问题
Q1: Hugging Face Hub 是什么?
答案:
Hugging Face Hub 是模型和数据集的托管平台(类似 GitHub for ML):
- Models:10 万+ 预训练模型
- Datasets:数万个公开数据集
- Spaces:部署 ML Demo(Gradio/Streamlit)
Q2: 如何量化模型减少显存?
答案:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
# 4-bit 量化(QLoRA)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
quantization_config=bnb_config,
device_map="auto",
)
4-bit 量化可以将 7B 模型的显存需求从 28GB 降到 ~4GB。
Q3: LoRA 微调的原理?
答案:
LoRA(Low-Rank Adaptation)冻结原始权重,只训练低秩分解的增量矩阵:
- 原始权重 不变
- 增加 ,其中 、,
- 只需训练 和 ,参数量大幅减少
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 6,742,609,920 || 0.06%