监控与告警
在生产环境中,MyBatis 的 SQL 执行性能直接影响系统的稳定性和用户体验。通过集成 Prometheus、Grafana 等监控工具,实时追踪 SQL 执行指标并设置告警阈值,可以在问题恶化前及时发现和处理。
监控指标体系
核心监控指标
| 指标类别 | 具体指标 | 含义 | 告警阈值建议 |
|---|---|---|---|
| 执行耗时 | P50/P90/P99 延迟 | 衡量 SQL 执行的响应时间分布 | P99 > 1s 告警 |
| 执行频率 | QPS/TPS | 每秒查询/事务执行次数 | 突增 200% 告警 |
| 错误率 | 失败次数/总次数 | SQL 执行失败比例 | 错误率 > 1% 告警 |
| 慢查询数 | 超过阈值的 SQL 数 | 执行时间超阈值的查询数量 | 慢查询 > 10/min 告警 |
| 连接池状态 | 活跃连接数/等待数 | 数据库连接池使用率 | 使用率 > 80% 告警 |
| 结果集大小 | 平均返回行数 | 单次查询返回的数据量 | 结果集 > 10000 行告警 |
指标命名规范
Prometheus 指标命名遵循 系统_组件_操作_单位 的规范:
Java
mybatis_executor_query_duration_seconds # SQL 查询耗时(秒)
mybatis_executor_update_duration_seconds # SQL 更新耗时(秒)
mybatis_executor_slow_query_total # 慢查询累计次数
mybatis_connection_pool_active_connections # 活跃连接数
mybatis_connection_pool_idle_connections # 空闲连接数
mybatis_cache_hit_total # 缓存命中累计次数
MyBatis 拦截器集成 Prometheus
核心监控拦截器
通过 MyBatis Interceptor 拦截 SQL 执行,使用 Micrometer 暴露指标:
Java
@Intercepts({
@Signature(type = StatementHandler.class, method = "query",
args = {Statement.class, ResultHandler.class}),
@Signature(type = StatementHandler.class, method = "update",
args = {Statement.class})
})
public class PrometheusMonitoringInterceptor implements Interceptor {
private final MeterRegistry meterRegistry;
private final Duration slowQueryThreshold = Duration.ofMillis(500);
// 指标对象
private final Timer queryTimer;
private final Timer updateTimer;
private final Counter slowQueryCounter;
private final Counter errorCounter;
private final DistributionSummary resultSetCounter;
public PrometheusMonitoringInterceptor(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
// 查询耗时 Timer
this.queryTimer = Timer.builder("mybatis.executor.query.duration")
.description("MyBatis SQL query duration")
.register(meterRegistry);
// 更新耗时 Timer
this.updateTimer = Timer.builder("mybatis.executor.update.duration")
.description("MyBatis SQL update duration")
.register(meterRegistry);
// 慢查询计数器
this.slowQueryCounter = Counter.builder("mybatis.executor.slow.query.total")
.description("Total number of slow queries")
.register(meterRegistry);
// 错误计数器
this.errorCounter = Counter.builder("mybatis.executor.error.total")
.description("Total number of SQL execution errors")
.register(meterRegistry);
// 结果集大小分布
this.resultSetCounter = DistributionSummary.builder("mybatis.executor.resultset.size")
.description("MyBatis result set size distribution")
.register(meterRegistry);
}
@Override
public Object intercept(Invocation invocation) throws Throwable {
StatementHandler handler = (StatementHandler) invocation.getTarget();
BoundSql boundSql = handler.getBoundSql();
String sql = normalizeSql(boundSql.getSql());
String statementId = extractStatementId(handler);
// 判断操作类型
boolean isQuery = invocation.getMethod().getName().equals("query");
Timer timer = isQuery ? queryTimer : updateTimer;
// 记录执行耗时
return timer.record(() -> {
long startTime = System.currentTimeMillis();
try {
Object result = invocation.proceed();
// 记录结果集大小(仅查询)
if (isQuery && result instanceof List) {
resultSetCounter.record(((List<?>) result).size());
}
// 慢查询检测
long elapsed = System.currentTimeMillis() - startTime;
if (elapsed > slowQueryThreshold.toMillis()) {
slowQueryCounter.increment();
logSlowQuery(statementId, sql, elapsed);
}
return result;
} catch (Throwable e) {
errorCounter.increment();
logError(statementId, sql, e);
throw e;
}
});
}
/**
* 规范化 SQL:去除多余空白,便于聚合
*/
private String normalizeSql(String sql) {
if (sql == null) return "";
return sql.replaceAll("\\s+", " ").trim().toUpperCase();
}
/**
* 从 StatementHandler 中提取 MappedStatement ID
*/
private String extractStatementId(StatementHandler handler) {
try {
MetaObject metaObject = SystemMetaObject.forObject(handler);
if (metaObject.hasGetter("delegate.mappedStatement")) {
MappedStatement ms = (MappedStatement)
metaObject.getValue("delegate.mappedStatement");
return ms.getId();
}
} catch (Exception ignored) {
// 忽略反射失败
}
return "unknown";
}
private void logSlowQuery(String statementId, String sql, long elapsed) {
// 可对接告警系统(钉钉、企业微信、邮件等)
System.err.printf("[SLOW QUERY] %d ms | %s | SQL: %s%n",
elapsed, statementId, sql.substring(0, Math.min(200, sql.length())));
}
private void logError(String statementId, String sql, Throwable e) {
System.err.printf("[SQL ERROR] %s | SQL: %s | Error: %s%n",
statementId, sql.substring(0, Math.min(200, sql.length())), e.getMessage());
}
@Override
public Object plugin(Object target) {
return Plugin.wrap(target, this);
}
@Override
public void setProperties(Properties properties) {
String threshold = properties.getProperty("slowQueryThreshold", "500");
// slowQueryThreshold = Duration.ofMillis(Long.parseLong(threshold));
}
}
Spring Boot 自动配置
YAML
@Configuration
public class MyBatisMonitoringConfig {
@Bean
public PrometheusMonitoringInterceptor prometheusMonitoringInterceptor(
MeterRegistry meterRegistry) {
return new PrometheusMonitoringInterceptor(meterRegistry);
}
@Bean
public ConfigurationCustomizer myBatisConfigurationCustomizer(
PrometheusMonitoringInterceptor interceptor) {
return configuration -> configuration.addInterceptor(interceptor);
}
}
Prometheus 配置
application.yml
YAML
management:
endpoints:
web:
exposure:
include: prometheus,health,metrics
endpoint:
prometheus:
enabled: true
metrics:
tags:
application: ${spring.application.name:mybatis-app}
prometheus.yml
JSON
global:
scrape_interval: 15s # 采集间隔
evaluation_interval: 15s
scrape_configs:
- job_name: 'mybatis-app'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['localhost:8080']
labels:
env: 'production'
service: 'order-service'
# 数据库连接池监控(HikariCP 自带指标)
- job_name: 'hikaricp'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['localhost:8080']
Grafana 仪表盘配置
关键面板设计
| 面板名称 | 指标来源 | 可视化类型 | 告警规则 |
|---|---|---|---|
| SQL 查询耗时热力图 | mybatis_executor_query_duration_seconds | Heatmap | P99 > 1s |
| 慢查询趋势图 | rate(mybatis_executor_slow_query_total[5m]) | Time series | > 10/min |
| SQL 错误率 | rate(mybatis_executor_error_total[5m]) | Stat + Time series | > 1%/min |
| 连接池使用率 | hikaricp_connections_active / hikaricp_connections_max | Gauge | > 80% |
| Top 10 慢查询 | 按 statement_id 聚合 P99 | Table | 静态 TOP 列表 |
| 缓存命中率 | rate(mybatis_cache_hit_total[5m]) | Stat | < 60% 告警 |
Grafana JSON 示例(SQL 耗时面板)
YAML
{
"title": "SQL Query Latency P99",
"type": "timeseries",
"targets": [
{
"expr": "histogram_quantile(0.99, rate(mybatis_executor_query_duration_seconds_bucket[5m]))",
"legendFormat": "P99 Latency"
},
{
"expr": "histogram_quantile(0.50, rate(mybatis_executor_query_duration_seconds_bucket[5m]))",
"legendFormat": "P50 Latency"
}
],
"thresholds": [
{ "value": 1.0, "color": "red", "line": true }
],
"unit": "s"
}
告警规则配置
Prometheus Alert Rules
YAML
# alert-rules.yml
groups:
- name: mybatis_alerts
rules:
# 慢查询告警
- alert: MybatisSlowQueryHigh
expr: rate(mybatis_executor_slow_query_total[5m]) > 10
for: 2m
labels:
severity: warning
annotations:
summary: "MyBatis 慢查询频率过高"
description: "当前实例 {{ $labels.instance }} 每分钟产生 {{ $value }} 条慢查询"
# SQL 错误率告警
- alert: MybatisSQLErrorRateHigh
expr: |
rate(mybatis_executor_error_total[5m]) /
(rate(mybatis_executor_query_duration_seconds_count[5m]) +
rate(mybatis_executor_update_duration_seconds_count[5m])) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "MyBatis SQL 错误率超过阈值"
description: "当前错误率 {{ $value | humanizePercentage }},超过 1% 阈值"
# 连接池耗尽告警
- alert: HikariCpPoolNearlyExhausted
expr: |
hikaricp_connections_active / hikaricp_connections_max > 0.8
for: 3m
labels:
severity: warning
annotations:
summary: "HikariCP 连接池使用率过高"
description: "连接池使用率 {{ $value | humanizePercentage }},建议检查慢查询或增大连接池"
# 缓存命中率过低
- alert: MybatisCacheHitRateLow
expr: |
rate(mybatis_cache_hit_total[5m]) /
(rate(mybatis_cache_hit_total[5m]) + rate(mybatis_cache_miss_total[5m])) < 0.6
for: 10m
labels:
severity: info
annotations:
summary: "MyBatis 缓存命中率过低"
description: "当前缓存命中率 {{ $value | humanizePercentage }},低于 60%"
告警通知渠道
YAML
# alertmanager.yml
route:
receiver: 'default'
group_by: ['alertname', 'env']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: 'default'
webhook_configs:
# 企业微信机器人
- url: 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx'
send_resolved: true
- name: 'critical'
webhook_configs:
# 钉钉告警
- url: 'https://oapi.dingtalk.com/robot/send?access_token=xxx'
send_resolved: true
pagerduty_configs:
- service_key: 'xxx'
连接池监控
HikariCP 指标
HikariCP 默认暴露以下 Prometheus 指标:
| 指标名 | 含义 | 健康范围 |
|---|---|---|
hikaricp_connections_active | 正在使用的连接数 | < max_pool_size * 0.8 |
hikaricp_connections_idle | 空闲连接数 | > min_idle |
hikaricp_connections_max | 连接池最大连接数 | 配置值 |
hikaricp_connections_pending | 等待获取连接的线程数 | = 0 |
hikaricp_connections_timeout_total | 连接获取超时累计数 | = 0 |
hikaricp_connections_creation_time_seconds | 连接创建耗时 | < 1s |
连接池配置建议
text
spring:
datasource:
hikari:
maximum-pool-size: 20 # 根据 CPU 核心数调整
minimum-idle: 5 # 保持最小空闲连接
connection-timeout: 30000 # 30s 获取连接超时
idle-timeout: 600000 # 10min 空闲回收
max-lifetime: 1800000 # 30min 连接最大生命周期
leak-detection-threshold: 60000 # 60s 连接泄漏检测
注意事项
- 监控开销:拦截器记录指标会增加 1%-3% 的 SQL 执行开销,生产环境建议仅开启关键指标
- SQL 脱敏:Prometheus 标签中不要包含 SQL 完整文本和参数值,避免泄露敏感信息
- 指标聚合:使用
rate()和histogram_quantile()函数进行聚合,避免直接使用 Counter 原始值- 告警降噪:设置合理的
for持续时间,避免瞬时波动触发告警造成告警疲劳- Grafana 权限:生产环境仪表盘应设置只读权限,防止误操作修改面板
要点总结
- 核心监控指标包括执行耗时(P50/P90/P99)、慢查询数、错误率、连接池使用率、结果集大小
- 通过 MyBatis Interceptor 拦截 SQL 执行,使用 Micrometer Timer/Counter 记录指标
- Prometheus 通过
/actuator/prometheus端点采集指标,配置 15s 采集间隔 - Grafana 仪表盘可视化 SQL 耗时热力图、慢查询趋势、连接池使用率等关键指标
- 告警规则设置慢查询 > 10/min、错误率 > 1%、连接池使用率 > 80% 等阈值
- SQL 标签中禁止包含完整 SQL 文本和参数值,防止敏感数据泄露;监控拦截器本身有 1%-3% 性能开销
存放路径:D:\git2\jwdev\articles\MYBATIS\专家\生产环境最佳实践\监控与告警.md
📝 发现内容有误?点击此处直接编辑