跳到主要内容

Fluentd 与 Fluent Bit

对比

Fluent BitFluentdVector
语言CRuby + CRust
内存~5MB~40MB~15MB
定位轻量采集(DaemonSet)聚合/处理节点采集 + 聚合一体
插件~100~1000~100
K8s 场景首选采集器聚合层全能替代
推荐组合
  • 小规模:Fluent Bit → Loki/ES(直连)
  • 大规模:Fluent Bit → Kafka → Fluentd → ES/Loki(分层架构)
  • 新项目:Vector 可同时替代 Fluent Bit + Fluentd

Fluent Bit

K8s DaemonSet 配置

fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon Off
Log_Level info
Parsers_File parsers.conf

[INPUT]
Name tail
Path /var/log/containers/*.log
Parser cri
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10

[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_Log On # 将 JSON 日志合并到顶层
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On # 支持 Pod 注解排除日志

[FILTER]
Name grep
Match kube.*
Exclude log healthcheck # 排除健康检查日志

[OUTPUT]
Name es
Match kube.*
Host elasticsearch
Port 9200
Index logs-k8s
Type _doc
Logstash_Format On
Retry_Limit 3

[OUTPUT]
Name loki
Match kube.*
Host loki
Port 3100
Labels job=fluent-bit, namespace=$kubernetes['namespace_name'], app=$kubernetes['labels']['app']

Fluentd

聚合层配置

fluentd.conf
<!-- 接收 Fluent Bit 转发的日志 -->
<source>
@type forward
port 24224
bind 0.0.0.0
</source>

<!-- 解析 JSON 日志 -->
<filter app.**>
@type parser
key_name log
<parse>
@type json
time_key timestamp
time_format %Y-%m-%dT%H:%M:%S.%LZ
</parse>
</filter>

<!-- 敏感信息脱敏 -->
<filter **>
@type record_transformer
<record>
message ${record["message"]&.gsub(/\d{11}/, '***')}
</record>
</filter>

<!-- 输出到 Elasticsearch -->
<match **>
@type elasticsearch
host elasticsearch
port 9200
logstash_format true
logstash_prefix logs
<buffer>
@type file
path /var/log/fluentd-buffers
flush_mode interval
flush_interval 5s
chunk_limit_size 8MB
retry_max_interval 30
retry_forever true
</buffer>
</match>

Vector

配置示例

vector.toml
# 采集 K8s 容器日志
[sources.kubernetes]
type = "kubernetes_logs"

# 解析 JSON
[transforms.parse]
type = "remap"
inputs = ["kubernetes"]
source = '''
. = parse_json!(.message)
.namespace = .kubernetes.pod_namespace
.app = .kubernetes.pod_labels.app
'''

# 过滤 DEBUG
[transforms.filter]
type = "filter"
inputs = ["parse"]
condition = '.level != "DEBUG"'

# 输出到 Loki
[sinks.loki]
type = "loki"
inputs = ["filter"]
endpoint = "http://loki:3100"
labels.app = "{{ app }}"
labels.namespace = "{{ namespace }}"

# 同时输出到 S3 归档
[sinks.s3]
type = "aws_s3"
inputs = ["filter"]
bucket = "logs-archive"
region = "ap-southeast-1"
compression = "gzip"

常见面试问题

Q1: K8s 日志采集选 Fluent Bit 还是 Fluentd?

答案

  • Fluent Bit:部署为 DaemonSet 在每个节点上,负责采集和简单过滤。内存仅 5MB,适合节点级采集
  • Fluentd:部署为 Deployment(少量实例),负责复杂的解析、过滤、富化
  • 大规模推荐组合:Fluent Bit(采集)→ Kafka(缓冲)→ Fluentd(处理)→ ES/Loki(存储)

Q2: 日志采集如何保证不丢失?

答案

  1. 文件偏移记录:Filebeat/Fluent Bit 记录读取位置,重启后续传
  2. 内存缓冲 + 磁盘缓冲:设置 Mem_Buf_Limit,超限写磁盘
  3. Kafka 缓冲:解耦采集和消费,Kafka 持久化保障
  4. 重试机制:输出插件失败时自动重试
  5. 背压处理:下游不可用时暂停采集,避免 OOM

相关链接