pprof 性能分析

问题

如何使用 Go 的 pprof 工具进行性能分析？如何排查 CPU 和内存瓶颈？

答案

pprof 简介

pprof 是 Go 内置的性能分析工具，可以分析：

CPU：哪些函数消耗 CPU 最多
堆内存：哪些函数分配内存最多
Goroutine：goroutine 阻塞在哪里
Mutex：锁竞争情况
Block：阻塞操作

接入方式

方式 1：HTTP 方式（推荐用于服务）

import _ "net/http/pprof"

func main() {
    go func() {
        // 默认注册到 DefaultServeMux
        http.ListenAndServe("localhost:6060", nil)
    }()
    
    // 你的业务代码...
}

可用端点：

http://localhost:6060/debug/pprof/ — 索引页
http://localhost:6060/debug/pprof/heap — 堆内存
http://localhost:6060/debug/pprof/profile?seconds=30 — CPU（采样 30 秒）
http://localhost:6060/debug/pprof/goroutine — goroutine
http://localhost:6060/debug/pprof/mutex — 锁竞争
http://localhost:6060/debug/pprof/block — 阻塞

方式 2：代码内嵌方式（适合 CLI / 测试）

import "runtime/pprof"

func main() {
    // CPU Profile
    f, _ := os.Create("cpu.prof")
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    
    // 你的代码...
    
    // Heap Profile
    f2, _ := os.Create("mem.prof")
    defer f2.Close()
    defer pprof.WriteHeapProfile(f2)
}

方式 3：Benchmark 自动采集

go test -bench=BenchmarkXxx -cpuprofile=cpu.prof -memprofile=mem.prof

分析 CPU Profile

# 采集 30 秒 CPU 数据
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# 或从文件分析
go tool pprof cpu.prof

交互式命令：

(pprof) top         # 显示 CPU 消耗最高的函数
(pprof) top 20      # 显示前 20 个函数
(pprof) list funcName  # 查看指定函数的逐行消耗
(pprof) web         # 生成 SVG 图（需要 Graphviz）

top 输出解读：

      flat  flat%   sum%        cum   cum%
  1200ms 40.00% 40.00%    1500ms 50.00%  main.compute
   800ms 26.67% 66.67%     800ms 26.67%  runtime.mallocgc

列	含义
`flat`	函数自身消耗的时间
`cum`	函数自身 + 调用的子函数的总时间
`flat%`	flat 占总采样的百分比
`cum%`	cum 占总采样的百分比

分析内存 Profile

# 查看当前堆内存使用
go tool pprof http://localhost:6060/debug/pprof/heap

# 查看分配次数（优化分配热点）
go tool pprof -alloc_objects http://localhost:6060/debug/pprof/heap

# 查看分配字节数
go tool pprof -alloc_space http://localhost:6060/debug/pprof/heap

选项	含义
`-inuse_space`（默认）	当前正在使用的内存
`-inuse_objects`	当前存活的对象数
`-alloc_space`	所有历史分配的总内存
`-alloc_objects`	所有历史分配的总对象数

排查内存泄漏

比较两个时间点的内存快照：

# 基线
curl http://localhost:6060/debug/pprof/heap > base.prof
# 等一段时间
curl http://localhost:6060/debug/pprof/heap > current.prof

# 比较差异
go tool pprof -diff_base=base.prof current.prof
(pprof) top

Web UI 火焰图

# 直接在浏览器中查看火焰图
go tool pprof -http=:8080 cpu.prof

# 或从远程采集
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

Web UI 提供：

Top：排行表
Graph：调用关系图
Flame Graph：火焰图（x 轴宽度 = 时间/内存占比）
Source：源码级分析

Goroutine 分析

# 查看所有 goroutine 的栈
go tool pprof http://localhost:6060/debug/pprof/goroutine

# 或直接查看文本
curl http://localhost:6060/debug/pprof/goroutine?debug=1
# debug=2 可以看到更详细的栈信息

常见面试问题

Q1: 生产环境能开 pprof 吗？

答案：

可以，注意：

只监听 localhost:6060（不要暴露到公网）
CPU Profile 有约 5% 的性能开销（采样时）
Heap Profile 几乎无开销（已经内置在运行时中）
通过端口转发或 VPN 访问

Q2: 如何用 pprof 定位 goroutine 泄漏？

答案：

# 1. 查看 goroutine 数量趋势
curl http://localhost:6060/debug/pprof/goroutine?debug=1 | head -1
# goroutine profile: total 1234

# 2. 如果数量持续增长，查看栈信息
go tool pprof http://localhost:6060/debug/pprof/goroutine
(pprof) top
# 找到数量最多的栈，定位阻塞原因

Q3: flat 和 cum 的区别？

答案：

flat：该函数自身的 CPU 时间 / 内存分配（不包含它调用的子函数）
cum：该函数及其调用栈中所有子函数的 CPU 时间 / 内存分配总和

优化时：

flat 高：函数本身有热点，直接优化
cum 高但 flat 低：瓶颈在子函数调用中

问题​

答案​

pprof 简介​

接入方式​

方式 1：HTTP 方式（推荐用于服务）​

方式 2：代码内嵌方式（适合 CLI / 测试）​

方式 3：Benchmark 自动采集​

分析 CPU Profile​

分析内存 Profile​

Web UI 火焰图​

Goroutine 分析​

常见面试问题​

Q1: 生产环境能开 pprof 吗？​

Q2: 如何用 pprof 定位 goroutine 泄漏？​

Q3: flat 和 cum 的区别？​

相关链接​

问题

答案

pprof 简介

接入方式

方式 1：HTTP 方式（推荐用于服务）

方式 2：代码内嵌方式（适合 CLI / 测试）

方式 3：Benchmark 自动采集

分析 CPU Profile

分析内存 Profile

Web UI 火焰图

Goroutine 分析

常见面试问题

Q1: 生产环境能开 pprof 吗？

Q2: 如何用 pprof 定位 goroutine 泄漏？

Q3: flat 和 cum 的区别？

相关链接