健康检查与自愈

容器健康检查确保服务可用，结合重启策略实现自动恢复。

HEALTHCHECK 指令

Dockerfile 配置

dockerfile

FROM nginx:alpine

HEALTHCHECK --interval=30s --timeout=3s --retries=3 --start-period=40s \
  CMD curl -f http://localhost:80/health || exit 1

CMD ["nginx", "-g", "daemon off;"]

参数说明：

--interval：检查间隔（默认 30s）
--timeout：超时时间（默认 30s）
--retries：失败重试次数（默认 3）
--start-period：启动宽限期（默认 0s）

运行时配置

Bash

docker run -d \
  --name my-app \
  --health-cmd "curl -f http://localhost:3000/health" \
  --health-interval 30s \
  --health-timeout 3s \
  --health-retries 3 \
  --health-start-period 40s \
  my-app

健康状态

Bash

# 查看健康状态
docker inspect --format='{{.State.Health.Status}}' my-app

# 输出
healthy / unhealthy / starting

# 查看详细日志
docker inspect --format='{{json .State.Health}}' my-app

检查方式

HTTP 检查

dockerfile

HEALTHCHECK CMD curl -f http://localhost:3000/health || exit 1

TCP 检查

dockerfile

HEALTHCHECK CMD nc -z localhost 5432 || exit 1

进程检查

dockerfile

HEALTHCHECK CMD pg_isready -U postgres || exit 1

自定义脚本

dockerfile

COPY healthcheck.sh /healthcheck.sh
HEALTHCHECK CMD /healthcheck.sh

重启策略

策略选项

Bash

# 总是重启
docker run --restart=always my-app

# 除非手动停止
docker run --restart=unless-stopped my-app

# 失败时重启（最多 5 次）
docker run --restart=on-failure:5 my-app

# 不重启
docker run --restart=no my-app

Compose 配置

YAML

services:
  app:
    image: my-app
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 3s
      retries: 3
      start_period: 40s

自愈流程

Bash

容器异常 → 健康检查失败 → 重启容器 → 等待启动宽限期 → 重新检查
    ↓                              ↓
  告警通知                      恢复正常

告警集成

text

# 监控健康状态变化
docker events --filter 'event=health_status: unhealthy'

# 触发告警
# 发送通知到 Slack/PagerDuty

要点总结

HEALTHCHECK 指令配置健康探测，支持 HTTP/TCP/进程检查
参数：interval（间隔）、timeout（超时）、retries（重试）、start-period（宽限期）
重启策略：always、unless-stopped、on-failure 控制自动恢复
健康状态通过 docker inspect 查看，支持事件监听
生产环境必须配置健康检查 + 重启策略，实现自动恢复

📝 发现内容有误？点击此处直接编辑