大文件上传和下载方案
问题
如何实现一个完善的大文件上传和下载方案?需要考虑哪些问题?
答案
大文件传输面临的挑战:
- 内存占用:一次性加载大文件会导致内存溢出
- 网络不稳定:传输中断需要重新开始
- 用户体验:没有进度反馈,用户不知道状态
- 服务器压力:单个大请求占用连接时间长
解决方案的核心思路:分片 + 断点续传 + 并发控制
大文件上传
整体流程
前端实现
1. 文件切片
fileChunk.ts
interface FileChunk {
file: Blob;
index: number;
hash: string;
}
// 文件切片
function createFileChunks(file: File, chunkSize = 5 * 1024 * 1024): FileChunk[] {
const chunks: FileChunk[] = [];
let cur = 0;
let index = 0;
while (cur < file.size) {
chunks.push({
file: file.slice(cur, cur + chunkSize),
index: index++,
hash: '', // 后续计算
});
cur += chunkSize;
}
return chunks;
}
2. 计算文件 Hash(用于秒传和断点续传)
使用 Web Worker 避免阻塞主线程:
hashWorker.ts
// worker.ts
import SparkMD5 from 'spark-md5';
self.onmessage = async (e: MessageEvent<{ chunks: Blob[] }>) => {
const { chunks } = e.data;
const spark = new SparkMD5.ArrayBuffer();
let percentage = 0;
for (let i = 0; i < chunks.length; i++) {
const chunk = chunks[i];
const buffer = await chunk.arrayBuffer();
spark.append(buffer);
percentage = Math.floor(((i + 1) / chunks.length) * 100);
self.postMessage({ percentage });
}
self.postMessage({ hash: spark.end() });
};
calculateHash.ts
function calculateHash(chunks: Blob[]): Promise<string> {
return new Promise((resolve) => {
const worker = new Worker(new URL('./hashWorker.ts', import.meta.url));
worker.onmessage = (e) => {
if (e.data.hash) {
resolve(e.data.hash);
worker.terminate();
}
};
worker.postMessage({ chunks });
});
}
3. 检查秒传和已上传分片
checkFile.ts
interface CheckResult {
shouldUpload: boolean;
uploadedChunks: number[];
}
async function checkFileStatus(
fileHash: string,
fileName: string
): Promise<CheckResult> {
const response = await fetch('/api/upload/check', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ fileHash, fileName }),
});
return response.json();
}
4. 并发上传分片(带重试)
uploadChunks.ts
interface UploadOptions {
fileHash: string;
fileName: string;
chunks: FileChunk[];
uploadedChunks: number[];
concurrency?: number;
onProgress?: (progress: number) => void;
}
async function uploadChunks(options: UploadOptions): Promise<void> {
const {
fileHash,
fileName,
chunks,
uploadedChunks,
concurrency = 3,
onProgress,
} = options;
// 过滤已上传的分片
const pendingChunks = chunks.filter(
(chunk) => !uploadedChunks.includes(chunk.index)
);
let uploadedCount = uploadedChunks.length;
const total = chunks.length;
// 创建上传任务
const uploadTask = async (chunk: FileChunk): Promise<void> => {
const formData = new FormData();
formData.append('file', chunk.file);
formData.append('hash', fileHash);
formData.append('index', String(chunk.index));
formData.append('fileName', fileName);
await fetchWithRetry('/api/upload/chunk', {
method: 'POST',
body: formData,
});
uploadedCount++;
onProgress?.(Math.floor((uploadedCount / total) * 100));
};
// 并发控制
await asyncPool(concurrency, pendingChunks, uploadTask);
}
// 并发池
async function asyncPool<T, R>(
concurrency: number,
items: T[],
fn: (item: T) => Promise<R>
): Promise<R[]> {
const results: R[] = [];
const executing: Promise<void>[] = [];
for (const item of items) {
const p = Promise.resolve().then(() => fn(item)).then((r) => {
results.push(r);
});
executing.push(p as Promise<void>);
if (executing.length >= concurrency) {
await Promise.race(executing);
executing.splice(
executing.findIndex((e) => e === p),
1
);
}
}
await Promise.all(executing);
return results;
}
// 带重试的 fetch
async function fetchWithRetry(
url: string,
options: RequestInit,
retries = 3
): Promise<Response> {
for (let i = 0; i < retries; i++) {
try {
const response = await fetch(url, options);
if (!response.ok) throw new Error('Upload failed');
return response;
} catch (error) {
if (i === retries - 1) throw error;
await new Promise((r) => setTimeout(r, 1000 * (i + 1)));
}
}
throw new Error('Max retries exceeded');
}
5. 请求合并分片
mergeChunks.ts
async function mergeChunks(
fileHash: string,
fileName: string,
chunkCount: number
): Promise<{ url: string }> {
const response = await fetch('/api/upload/merge', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ fileHash, fileName, chunkCount }),
});
return response.json();
}
6. 完整上传流程
upload.ts
interface UploadResult {
success: boolean;
url?: string;
error?: string;
}
async function uploadFile(
file: File,
onProgress?: (progress: number) => void
): Promise<UploadResult> {
try {
// 1. 切片
const chunks = createFileChunks(file);
// 2. 计算 Hash
const fileHash = await calculateHash(chunks.map((c) => c.file));
// 3. 检查秒传
const { shouldUpload, uploadedChunks } = await checkFileStatus(
fileHash,
file.name
);
if (!shouldUpload) {
onProgress?.(100);
return { success: true, url: `/files/${fileHash}` };
}
// 4. 上传分片
await uploadChunks({
fileHash,
fileName: file.name,
chunks,
uploadedChunks,
onProgress,
});
// 5. 合并分片
const { url } = await mergeChunks(fileHash, file.name, chunks.length);
return { success: true, url };
} catch (error) {
return { success: false, error: (error as Error).message };
}
}
后端实现(Node.js + Express)
1. 检查文件状态
server/checkController.ts
import { Request, Response } from 'express';
import fs from 'fs/promises';
import path from 'path';
const UPLOAD_DIR = path.resolve(__dirname, '../uploads');
const CHUNK_DIR = path.resolve(__dirname, '../chunks');
export async function checkFile(req: Request, res: Response): Promise<void> {
const { fileHash, fileName } = req.body;
// 检查文件是否已存在(秒传)
const filePath = path.join(UPLOAD_DIR, `${fileHash}${path.extname(fileName)}`);
try {
await fs.access(filePath);
res.json({ shouldUpload: false, uploadedChunks: [] });
return;
} catch {
// 文件不存在,继续检查分片
}
// 检查已上传的分片
const chunkDir = path.join(CHUNK_DIR, fileHash);
let uploadedChunks: number[] = [];
try {
const files = await fs.readdir(chunkDir);
uploadedChunks = files.map((f) => parseInt(f.split('-')[1], 10));
} catch {
// 没有已上传的分片
}
res.json({ shouldUpload: true, uploadedChunks });
}
2. 接收分片
server/chunkController.ts
import { Request, Response } from 'express';
import fs from 'fs/promises';
import path from 'path';
const CHUNK_DIR = path.resolve(__dirname, '../chunks');
export async function uploadChunk(req: Request, res: Response): Promise<void> {
const { hash, index, fileName } = req.body;
const file = req.file;
if (!file) {
res.status(400).json({ error: 'No file uploaded' });
return;
}
const chunkDir = path.join(CHUNK_DIR, hash);
// 确保分片目录存在
await fs.mkdir(chunkDir, { recursive: true });
// 移动分片到目标位置
const chunkPath = path.join(chunkDir, `chunk-${index}`);
await fs.rename(file.path, chunkPath);
res.json({ success: true });
}
3. 合并分片
server/mergeController.ts
import { Request, Response } from 'express';
import fs from 'fs/promises';
import { createWriteStream, createReadStream } from 'fs';
import path from 'path';
const UPLOAD_DIR = path.resolve(__dirname, '../uploads');
const CHUNK_DIR = path.resolve(__dirname, '../chunks');
export async function mergeChunks(req: Request, res: Response): Promise<void> {
const { fileHash, fileName, chunkCount } = req.body;
const chunkDir = path.join(CHUNK_DIR, fileHash);
const ext = path.extname(fileName);
const filePath = path.join(UPLOAD_DIR, `${fileHash}${ext}`);
// 确保上传目录存在
await fs.mkdir(UPLOAD_DIR, { recursive: true });
// 创建写入流
const writeStream = createWriteStream(filePath);
// 按顺序合并分片
for (let i = 0; i < chunkCount; i++) {
const chunkPath = path.join(chunkDir, `chunk-${i}`);
const chunkBuffer = await fs.readFile(chunkPath);
writeStream.write(chunkBuffer);
}
writeStream.end();
// 等待写入完成
await new Promise((resolve) => writeStream.on('finish', resolve));
// 删除分片目录
await fs.rm(chunkDir, { recursive: true });
res.json({ success: true, url: `/files/${fileHash}${ext}` });
}
4. 路由配置
server/routes.ts
import express from 'express';
import multer from 'multer';
import { checkFile } from './checkController';
import { uploadChunk } from './chunkController';
import { mergeChunks } from './mergeController';
const router = express.Router();
const upload = multer({ dest: 'temp/' });
router.post('/upload/check', checkFile);
router.post('/upload/chunk', upload.single('file'), uploadChunk);
router.post('/upload/merge', mergeChunks);
export default router;
大文件下载
整体流程
前端实现
1. 获取文件信息
getFileInfo.ts
interface FileInfo {
size: number;
name: string;
contentType: string;
}
async function getFileInfo(url: string): Promise<FileInfo> {
const response = await fetch(url, { method: 'HEAD' });
const size = parseInt(response.headers.get('content-length') || '0', 10);
const contentType = response.headers.get('content-type') || '';
const disposition = response.headers.get('content-disposition') || '';
// 从 Content-Disposition 提取文件名
const nameMatch = disposition.match(/filename="?([^"]+)"?/);
const name = nameMatch ? nameMatch[1] : 'download';
return { size, name, contentType };
}
2. 分片下载
downloadChunk.ts
async function downloadChunk(
url: string,
start: number,
end: number
): Promise<ArrayBuffer> {
const response = await fetch(url, {
headers: {
Range: `bytes=${start}-${end}`,
},
});
if (!response.ok && response.status !== 206) {
throw new Error('Download failed');
}
return response.arrayBuffer();
}
3. 并发下载与合并
downloadFile.ts
interface DownloadOptions {
url: string;
chunkSize?: number;
concurrency?: number;
onProgress?: (progress: number) => void;
}
async function downloadFile(options: DownloadOptions): Promise<Blob> {
const {
url,
chunkSize = 5 * 1024 * 1024,
concurrency = 3,
onProgress,
} = options;
// 1. 获取文件信息
const fileInfo = await getFileInfo(url);
const { size, contentType } = fileInfo;
// 2. 计算分片
const chunks: Array<{ start: number; end: number; index: number }> = [];
let cur = 0;
let index = 0;
while (cur < size) {
const end = Math.min(cur + chunkSize - 1, size - 1);
chunks.push({ start: cur, end, index: index++ });
cur = end + 1;
}
// 3. 并发下载
let downloadedCount = 0;
const results: ArrayBuffer[] = new Array(chunks.length);
const downloadTask = async (
chunk: { start: number; end: number; index: number }
): Promise<void> => {
const buffer = await downloadChunk(url, chunk.start, chunk.end);
results[chunk.index] = buffer;
downloadedCount++;
onProgress?.(Math.floor((downloadedCount / chunks.length) * 100));
};
await asyncPool(concurrency, chunks, downloadTask);
// 4. 合并为 Blob
return new Blob(results, { type: contentType });
}
// 触发浏览器下载
function triggerDownload(blob: Blob, fileName: string): void {
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = fileName;
a.click();
URL.revokeObjectURL(url);
}
4. 支持断点续传的下载
resumableDownload.ts
interface DownloadState {
url: string;
fileName: string;
totalSize: number;
downloadedChunks: number[];
chunks: ArrayBuffer[];
}
class ResumableDownloader {
private state: DownloadState | null = null;
private storageKey = 'download_state';
async download(
url: string,
onProgress?: (progress: number) => void
): Promise<void> {
// 恢复之前的下载状态
this.loadState();
const fileInfo = await getFileInfo(url);
const chunkSize = 5 * 1024 * 1024;
if (!this.state || this.state.url !== url) {
this.state = {
url,
fileName: fileInfo.name,
totalSize: fileInfo.size,
downloadedChunks: [],
chunks: [],
};
}
const totalChunks = Math.ceil(fileInfo.size / chunkSize);
for (let i = 0; i < totalChunks; i++) {
if (this.state.downloadedChunks.includes(i)) {
continue;
}
const start = i * chunkSize;
const end = Math.min(start + chunkSize - 1, fileInfo.size - 1);
try {
const buffer = await downloadChunk(url, start, end);
this.state.chunks[i] = buffer;
this.state.downloadedChunks.push(i);
this.saveState();
onProgress?.(
Math.floor((this.state.downloadedChunks.length / totalChunks) * 100)
);
} catch (error) {
console.error(`Chunk ${i} download failed, will retry later`);
throw error;
}
}
// 下载完成,合并并触发下载
const blob = new Blob(this.state.chunks, {
type: fileInfo.contentType,
});
triggerDownload(blob, this.state.fileName);
this.clearState();
}
private saveState(): void {
if (!this.state) return;
// 注意:ArrayBuffer 不能直接序列化,实际项目中需要存储到 IndexedDB
const saveData = {
...this.state,
chunks: [], // 实际项目中应使用 IndexedDB 存储
};
localStorage.setItem(this.storageKey, JSON.stringify(saveData));
}
private loadState(): void {
const saved = localStorage.getItem(this.storageKey);
if (saved) {
this.state = JSON.parse(saved);
}
}
private clearState(): void {
this.state = null;
localStorage.removeItem(this.storageKey);
}
}
后端实现
支持 Range 请求
server/downloadController.ts
import { Request, Response } from 'express';
import fs from 'fs';
import path from 'path';
const UPLOAD_DIR = path.resolve(__dirname, '../uploads');
export function downloadFile(req: Request, res: Response): void {
const { fileHash } = req.params;
const filePath = path.join(UPLOAD_DIR, fileHash);
// 检查文件是否存在
if (!fs.existsSync(filePath)) {
res.status(404).json({ error: 'File not found' });
return;
}
const stat = fs.statSync(filePath);
const fileSize = stat.size;
const range = req.headers.range;
if (range) {
// 处理 Range 请求
const parts = range.replace(/bytes=/, '').split('-');
const start = parseInt(parts[0], 10);
const end = parts[1] ? parseInt(parts[1], 10) : fileSize - 1;
const chunkSize = end - start + 1;
res.writeHead(206, {
'Content-Range': `bytes ${start}-${end}/${fileSize}`,
'Accept-Ranges': 'bytes',
'Content-Length': chunkSize,
'Content-Type': 'application/octet-stream',
});
const stream = fs.createReadStream(filePath, { start, end });
stream.pipe(res);
} else {
// 普通下载
res.writeHead(200, {
'Content-Length': fileSize,
'Content-Type': 'application/octet-stream',
'Content-Disposition': `attachment; filename="${path.basename(filePath)}"`,
});
fs.createReadStream(filePath).pipe(res);
}
}
完整的上传组件示例
FileUploader.tsx
import React, { useState, useCallback } from 'react';
interface UploadState {
status: 'idle' | 'hashing' | 'uploading' | 'success' | 'error';
progress: number;
error?: string;
}
export function FileUploader(): React.ReactElement {
const [state, setState] = useState<UploadState>({
status: 'idle',
progress: 0,
});
const handleFileChange = useCallback(
async (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file) return;
try {
setState({ status: 'hashing', progress: 0 });
const result = await uploadFile(file, (progress) => {
setState((prev) => ({
...prev,
status: 'uploading',
progress,
}));
});
if (result.success) {
setState({ status: 'success', progress: 100 });
} else {
setState({ status: 'error', progress: 0, error: result.error });
}
} catch (error) {
setState({
status: 'error',
progress: 0,
error: (error as Error).message,
});
}
},
[]
);
return (
<div>
<input
type="file"
onChange={handleFileChange}
disabled={state.status === 'uploading' || state.status === 'hashing'}
/>
{state.status === 'hashing' && <p>计算文件 Hash 中...</p>}
{state.status === 'uploading' && (
<div>
<progress value={state.progress} max={100} />
<span>{state.progress}%</span>
</div>
)}
{state.status === 'success' && <p>上传成功!</p>}
{state.status === 'error' && <p>上传失败:{state.error}</p>}
</div>
);
}
优化策略总结
上传优化
| 策略 | 说明 |
|---|---|
| 分片上传 | 大文件切成小块,单个失败只需重传该片 |
| 秒传 | 通过 Hash 检测文件是否已存在 |
| 断点续传 | 记录已上传分片,断网后继续 |
| 并发控制 | 限制同时上传的分片数量 |
| Web Worker | 计算 Hash 不阻塞主线程 |
| 重试机制 | 失败自动重试,指数退避 |
下载优化
| 策略 | 说明 |
|---|---|
| Range 请求 | 支持分段下载 |
| 并发下载 | 多个分片同时下载 |
| 断点续传 | 保存下载进度,中断后继续 |
| 流式下载 | 边下载边写入,减少内存占用 |
常见面试问题
Q1: 如何实现秒传?
通过计算文件的 Hash(如 MD5、SHA-256),上传前先请求服务器检查该 Hash 是否已存在。如果存在,直接返回成功,无需上传。
Q2: 如何保证分片的顺序和完整性?
- 顺序:每个分片带有
index标识,合并时按顺序读取 - 完整性:可以对每个分片计算 Hash,服务端验证
Q3: 如果用户刷新页面,如何恢复上传?
- 上传前将文件 Hash 和已上传分片信息存储到
localStorage或IndexedDB - 页面加载时检查是否有未完成的上传
- 如果有,获取已上传分片列表,继续上传剩余分片
Q4: 并发数设置多少合适?
- 一般设置 3-6 个并发
- 太少:浪费带宽
- 太多:可能触发浏览器限制或服务器压力过大
- 可以根据网络状况动态调整
相关链接
- MDN - File API
- MDN - Blob
- MDN - Range 请求
- spark-md5 - 增量计算 MD5