调度与资源管理

调度流程

资源请求与限制

spec:
  containers:
    - name: app
      resources:
        # requests：调度依据，保证能获得的最小资源
        requests:
          cpu: "250m"        # 0.25 核
          memory: "256Mi"
        # limits：上限，超出会被限制或 OOMKill
        limits:
          cpu: "1000m"       # 1 核
          memory: "512Mi"

参数	说明	不设置后果
`requests`	调度参考，保证最小资源	可能调度到资源不足的 Node
`limits`	资源上限	CPU 不限制，内存可能 OOM 影响其他 Pod

CPU 单位

1 = 1 核 = 1000m（毫核）
250m = 0.25 核
100m = 0.1 核

内存单位：Mi（MiB）、Gi（GiB）

QoS 等级

K8s 根据 requests 和 limits 设置自动划分 QoS：

QoS	条件	OOM 优先级
Guaranteed	requests = limits（都设置且相等）	最后被杀
Burstable	requests < limits	中等
BestEffort	不设置 requests/limits	最先被杀

水平自动伸缩（HPA）

hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
    # CPU 使用率
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70    # CPU 超过 70% 扩容
    # 内存使用率
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # 扩容冷却
      policies:
        - type: Percent
          value: 100                   # 最多翻倍
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # 缩容冷却 5 分钟
      policies:
        - type: Percent
          value: 10                    # 每次最多缩 10%
          periodSeconds: 60

# 快速创建 HPA
kubectl autoscale deployment myapp --cpu-percent=70 --min=2 --max=10

# 查看 HPA 状态
kubectl get hpa
kubectl describe hpa myapp-hpa

HPA 前提

需要安装 Metrics Server 才能获取 CPU/内存指标。自定义指标需要 Prometheus Adapter。

NodeSelector 与 NodeAffinity

NodeSelector（简单）

spec:
  nodeSelector:
    disktype: ssd              # 调度到有 ssd 标签的 Node
    region: us-east-1

NodeAffinity（高级）

spec:
  affinity:
    nodeAffinity:
      # 硬限制：必须满足
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values: ["amd64"]
      # 软限制：优先满足
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 80
          preference:
            matchExpressions:
              - key: disktype
                operator: In
                values: ["ssd"]

Pod 亲和与反亲和

spec:
  affinity:
    # Pod 亲和：同一拓扑域调度
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: cache
          topologyKey: kubernetes.io/hostname  # 同 Node
    # Pod 反亲和：不同 Node 运行（高可用）
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: myapp
            topologyKey: kubernetes.io/hostname

高可用部署

Pod 反亲和确保同一 Deployment 的 Pod 分散到不同 Node，避免单点故障。

污点与容忍

# 给 Node 添加污点
kubectl taint nodes node1 dedicated=gpu:NoSchedule

# 删除污点
kubectl taint nodes node1 dedicated=gpu:NoSchedule-

# Pod 容忍污点
spec:
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "gpu"
      effect: "NoSchedule"

Effect	说明
`NoSchedule`	不调度新 Pod
`PreferNoSchedule`	尽量不调度
`NoExecute`	不调度 + 驱逐已有 Pod

拓扑分布约束

spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: myapp

确保 Pod 均匀分布在不同可用区，提高容灾能力。

常见面试问题

Q1: requests 和 limits 的区别？

答案：

requests：Pod 调度时的保证资源量。Scheduler 根据 requests 选择 Node。Pod 一定能使用 requests 声明的资源
limits：资源使用上限。CPU 超限被节流（throttle），内存超限被 OOMKill
requests ≤ limits，推荐设置两者。核心服务设 requests = limits（Guaranteed QoS）

Q2: HPA 和 VPA 的区别？

答案：

HPA：水平伸缩，增减 Pod 数量。适合无状态服务
VPA：垂直伸缩，调整单 Pod 的 CPU/内存。适合有状态服务或单实例
两者不建议同时使用同一指标（会冲突）

Q3: 如何实现 Pod 高可用部署？

答案：

Pod 反亲和：分散到不同 Node
拓扑分布约束：跨可用区均匀分布
PodDisruptionBudget：限制同时不可用的 Pod 数量
合理设置 replicas >= 3
配置 readinessProbe，确保流量只到就绪的 Pod

调度流程​

资源请求与限制​

QoS 等级​

水平自动伸缩（HPA）​

NodeSelector 与 NodeAffinity​

NodeSelector（简单）​

NodeAffinity（高级）​

Pod 亲和与反亲和​

污点与容忍​

拓扑分布约束​

常见面试问题​

Q1: requests 和 limits 的区别？​

Q2: HPA 和 VPA 的区别？​

Q3: 如何实现 Pod 高可用部署？​

相关链接​