日志聚合与监控栈
容器监控需要日志聚合和指标监控,下面介绍完整监控栈部署。
Prometheus 部署
docker-compose.yml
YAML
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
volumes:
prometheus-data:
prometheus.yml
YAML
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'docker'
static_configs:
- targets: ['host.docker.internal:9323']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
cAdvisor
YAML
services:
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
command:
- '-docker_only'
Grafana 部署
YAML
services:
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
grafana-data:
数据源配置
Bash
# 添加 Prometheus 数据源
# Grafana UI: Configuration → Data Sources → Prometheus
# URL: http://prometheus:9090
告警规则
prometheus-rules.yml
YAML
groups:
- name: container-alerts
rules:
- alert: HighCPUUsage
expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} high CPU usage"
- alert: HighMemoryUsage
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
for: 5m
labels:
severity: critical
annotations:
summary: "Container {{ $labels.name }} high memory usage"
通知渠道
YAML
# alertmanager.yml
route:
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
send_resolved: true
text: "{{ .CommonAnnotations.summary }}"
常用指标
| 指标 | 说明 | 告警阈值 |
|---|---|---|
| CPU 使用率 | container_cpu_usage_seconds_total | >80% |
| 内存使用率 | container_memory_usage_bytes | >90% |
| 磁盘 IO | container_fs_reads_bytes_total | 异常增长 |
| 网络 IO | container_network_receive_bytes_total | 异常增长 |
| 容器重启 | container_restart_count | >3 次/小时 |
仪表板
Bash
# 导入预置仪表板
# Grafana → Import → Dashboard ID: 193 (Docker)
# Grafana → Import → Dashboard ID: 893 (cAdvisor)
要点总结
- Prometheus 采集和存储容器指标,cAdvisor 提供 Docker 指标
- Grafana 可视化和告警,支持多种数据源
- 告警规则定义阈值(CPU、内存、重启次数)
- Alertmanager 发送通知到 Slack、邮件等渠道
- 生产环境必须部署完整监控栈,及时发现问题
📝 发现内容有误?点击此处直接编辑