全部学科
NodeJS全栈
nodejs
Python全栈
python
小程序首页
📅 2026-05-20 8 分钟 ✍️ juanwangdev

监控与告警

在生产环境中,MyBatis 的 SQL 执行性能直接影响系统的稳定性和用户体验。通过集成 Prometheus、Grafana 等监控工具,实时追踪 SQL 执行指标并设置告警阈值,可以在问题恶化前及时发现和处理。

监控指标体系

核心监控指标

指标类别具体指标含义告警阈值建议
执行耗时P50/P90/P99 延迟衡量 SQL 执行的响应时间分布P99 > 1s 告警
执行频率QPS/TPS每秒查询/事务执行次数突增 200% 告警
错误率失败次数/总次数SQL 执行失败比例错误率 > 1% 告警
慢查询数超过阈值的 SQL 数执行时间超阈值的查询数量慢查询 > 10/min 告警
连接池状态活跃连接数/等待数数据库连接池使用率使用率 > 80% 告警
结果集大小平均返回行数单次查询返回的数据量结果集 > 10000 行告警

指标命名规范

Prometheus 指标命名遵循 系统_组件_操作_单位 的规范:

Java
mybatis_executor_query_duration_seconds     # SQL 查询耗时(秒)
mybatis_executor_update_duration_seconds    # SQL 更新耗时(秒)
mybatis_executor_slow_query_total           # 慢查询累计次数
mybatis_connection_pool_active_connections  # 活跃连接数
mybatis_connection_pool_idle_connections    # 空闲连接数
mybatis_cache_hit_total                     # 缓存命中累计次数

MyBatis 拦截器集成 Prometheus

核心监控拦截器

通过 MyBatis Interceptor 拦截 SQL 执行,使用 Micrometer 暴露指标:

Java
@Intercepts({
    @Signature(type = StatementHandler.class, method = "query",
        args = {Statement.class, ResultHandler.class}),
    @Signature(type = StatementHandler.class, method = "update",
        args = {Statement.class})
})
public class PrometheusMonitoringInterceptor implements Interceptor {

    private final MeterRegistry meterRegistry;
    private final Duration slowQueryThreshold = Duration.ofMillis(500);

    // 指标对象
    private final Timer queryTimer;
    private final Timer updateTimer;
    private final Counter slowQueryCounter;
    private final Counter errorCounter;
    private final DistributionSummary resultSetCounter;

    public PrometheusMonitoringInterceptor(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;

        // 查询耗时 Timer
        this.queryTimer = Timer.builder("mybatis.executor.query.duration")
            .description("MyBatis SQL query duration")
            .register(meterRegistry);

        // 更新耗时 Timer
        this.updateTimer = Timer.builder("mybatis.executor.update.duration")
            .description("MyBatis SQL update duration")
            .register(meterRegistry);

        // 慢查询计数器
        this.slowQueryCounter = Counter.builder("mybatis.executor.slow.query.total")
            .description("Total number of slow queries")
            .register(meterRegistry);

        // 错误计数器
        this.errorCounter = Counter.builder("mybatis.executor.error.total")
            .description("Total number of SQL execution errors")
            .register(meterRegistry);

        // 结果集大小分布
        this.resultSetCounter = DistributionSummary.builder("mybatis.executor.resultset.size")
            .description("MyBatis result set size distribution")
            .register(meterRegistry);
    }

    @Override
    public Object intercept(Invocation invocation) throws Throwable {
        StatementHandler handler = (StatementHandler) invocation.getTarget();
        BoundSql boundSql = handler.getBoundSql();
        String sql = normalizeSql(boundSql.getSql());
        String statementId = extractStatementId(handler);

        // 判断操作类型
        boolean isQuery = invocation.getMethod().getName().equals("query");
        Timer timer = isQuery ? queryTimer : updateTimer;

        // 记录执行耗时
        return timer.record(() -> {
            long startTime = System.currentTimeMillis();
            try {
                Object result = invocation.proceed();

                // 记录结果集大小(仅查询)
                if (isQuery && result instanceof List) {
                    resultSetCounter.record(((List<?>) result).size());
                }

                // 慢查询检测
                long elapsed = System.currentTimeMillis() - startTime;
                if (elapsed > slowQueryThreshold.toMillis()) {
                    slowQueryCounter.increment();
                    logSlowQuery(statementId, sql, elapsed);
                }

                return result;
            } catch (Throwable e) {
                errorCounter.increment();
                logError(statementId, sql, e);
                throw e;
            }
        });
    }

    /**
     * 规范化 SQL:去除多余空白,便于聚合
     */
    private String normalizeSql(String sql) {
        if (sql == null) return "";
        return sql.replaceAll("\\s+", " ").trim().toUpperCase();
    }

    /**
     * 从 StatementHandler 中提取 MappedStatement ID
     */
    private String extractStatementId(StatementHandler handler) {
        try {
            MetaObject metaObject = SystemMetaObject.forObject(handler);
            if (metaObject.hasGetter("delegate.mappedStatement")) {
                MappedStatement ms = (MappedStatement)
                    metaObject.getValue("delegate.mappedStatement");
                return ms.getId();
            }
        } catch (Exception ignored) {
            // 忽略反射失败
        }
        return "unknown";
    }

    private void logSlowQuery(String statementId, String sql, long elapsed) {
        // 可对接告警系统(钉钉、企业微信、邮件等)
        System.err.printf("[SLOW QUERY] %d ms | %s | SQL: %s%n",
            elapsed, statementId, sql.substring(0, Math.min(200, sql.length())));
    }

    private void logError(String statementId, String sql, Throwable e) {
        System.err.printf("[SQL ERROR] %s | SQL: %s | Error: %s%n",
            statementId, sql.substring(0, Math.min(200, sql.length())), e.getMessage());
    }

    @Override
    public Object plugin(Object target) {
        return Plugin.wrap(target, this);
    }

    @Override
    public void setProperties(Properties properties) {
        String threshold = properties.getProperty("slowQueryThreshold", "500");
        // slowQueryThreshold = Duration.ofMillis(Long.parseLong(threshold));
    }
}

Spring Boot 自动配置

YAML
@Configuration
public class MyBatisMonitoringConfig {

    @Bean
    public PrometheusMonitoringInterceptor prometheusMonitoringInterceptor(
            MeterRegistry meterRegistry) {
        return new PrometheusMonitoringInterceptor(meterRegistry);
    }

    @Bean
    public ConfigurationCustomizer myBatisConfigurationCustomizer(
            PrometheusMonitoringInterceptor interceptor) {
        return configuration -> configuration.addInterceptor(interceptor);
    }
}

Prometheus 配置

application.yml

YAML
management:
  endpoints:
    web:
      exposure:
        include: prometheus,health,metrics
  endpoint:
    prometheus:
      enabled: true
  metrics:
    tags:
      application: ${spring.application.name:mybatis-app}

prometheus.yml

JSON
global:
  scrape_interval: 15s    # 采集间隔
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'mybatis-app'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['localhost:8080']
        labels:
          env: 'production'
          service: 'order-service'

  # 数据库连接池监控(HikariCP 自带指标)
  - job_name: 'hikaricp'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['localhost:8080']

Grafana 仪表盘配置

关键面板设计

面板名称指标来源可视化类型告警规则
SQL 查询耗时热力图mybatis_executor_query_duration_secondsHeatmapP99 > 1s
慢查询趋势图rate(mybatis_executor_slow_query_total[5m])Time series> 10/min
SQL 错误率rate(mybatis_executor_error_total[5m])Stat + Time series> 1%/min
连接池使用率hikaricp_connections_active / hikaricp_connections_maxGauge> 80%
Top 10 慢查询statement_id 聚合 P99Table静态 TOP 列表
缓存命中率rate(mybatis_cache_hit_total[5m])Stat< 60% 告警

Grafana JSON 示例(SQL 耗时面板)

YAML
{
  "title": "SQL Query Latency P99",
  "type": "timeseries",
  "targets": [
    {
      "expr": "histogram_quantile(0.99, rate(mybatis_executor_query_duration_seconds_bucket[5m]))",
      "legendFormat": "P99 Latency"
    },
    {
      "expr": "histogram_quantile(0.50, rate(mybatis_executor_query_duration_seconds_bucket[5m]))",
      "legendFormat": "P50 Latency"
    }
  ],
  "thresholds": [
    { "value": 1.0, "color": "red", "line": true }
  ],
  "unit": "s"
}

告警规则配置

Prometheus Alert Rules

YAML
# alert-rules.yml
groups:
  - name: mybatis_alerts
    rules:
      # 慢查询告警
      - alert: MybatisSlowQueryHigh
        expr: rate(mybatis_executor_slow_query_total[5m]) > 10
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "MyBatis 慢查询频率过高"
          description: "当前实例 {{ $labels.instance }} 每分钟产生 {{ $value }} 条慢查询"

      # SQL 错误率告警
      - alert: MybatisSQLErrorRateHigh
        expr: |
          rate(mybatis_executor_error_total[5m]) /
          (rate(mybatis_executor_query_duration_seconds_count[5m]) +
           rate(mybatis_executor_update_duration_seconds_count[5m])) > 0.01          
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "MyBatis SQL 错误率超过阈值"
          description: "当前错误率 {{ $value | humanizePercentage }},超过 1% 阈值"

      # 连接池耗尽告警
      - alert: HikariCpPoolNearlyExhausted
        expr: |
                    hikaricp_connections_active / hikaricp_connections_max > 0.8
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "HikariCP 连接池使用率过高"
          description: "连接池使用率 {{ $value | humanizePercentage }},建议检查慢查询或增大连接池"

      # 缓存命中率过低
      - alert: MybatisCacheHitRateLow
        expr: |
          rate(mybatis_cache_hit_total[5m]) /
          (rate(mybatis_cache_hit_total[5m]) + rate(mybatis_cache_miss_total[5m])) < 0.6          
        for: 10m
        labels:
          severity: info
        annotations:
          summary: "MyBatis 缓存命中率过低"
          description: "当前缓存命中率 {{ $value | humanizePercentage }},低于 60%"

告警通知渠道

YAML
# alertmanager.yml
route:
  receiver: 'default'
  group_by: ['alertname', 'env']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: 'default'
    webhook_configs:
      # 企业微信机器人
      - url: 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx'
        send_resolved: true

  - name: 'critical'
    webhook_configs:
      # 钉钉告警
      - url: 'https://oapi.dingtalk.com/robot/send?access_token=xxx'
        send_resolved: true
    pagerduty_configs:
      - service_key: 'xxx'

连接池监控

HikariCP 指标

HikariCP 默认暴露以下 Prometheus 指标:

指标名含义健康范围
hikaricp_connections_active正在使用的连接数< max_pool_size * 0.8
hikaricp_connections_idle空闲连接数> min_idle
hikaricp_connections_max连接池最大连接数配置值
hikaricp_connections_pending等待获取连接的线程数= 0
hikaricp_connections_timeout_total连接获取超时累计数= 0
hikaricp_connections_creation_time_seconds连接创建耗时< 1s

连接池配置建议

text
spring:
  datasource:
    hikari:
      maximum-pool-size: 20        # 根据 CPU 核心数调整
      minimum-idle: 5              # 保持最小空闲连接
      connection-timeout: 30000    # 30s 获取连接超时
      idle-timeout: 600000         # 10min 空闲回收
      max-lifetime: 1800000        # 30min 连接最大生命周期
      leak-detection-threshold: 60000  # 60s 连接泄漏检测

注意事项

  1. 监控开销:拦截器记录指标会增加 1%-3% 的 SQL 执行开销,生产环境建议仅开启关键指标
  2. SQL 脱敏:Prometheus 标签中不要包含 SQL 完整文本和参数值,避免泄露敏感信息
  3. 指标聚合:使用 rate()histogram_quantile() 函数进行聚合,避免直接使用 Counter 原始值
  4. 告警降噪:设置合理的 for 持续时间,避免瞬时波动触发告警造成告警疲劳
  5. Grafana 权限:生产环境仪表盘应设置只读权限,防止误操作修改面板

要点总结

  • 核心监控指标包括执行耗时(P50/P90/P99)、慢查询数、错误率、连接池使用率、结果集大小
  • 通过 MyBatis Interceptor 拦截 SQL 执行,使用 Micrometer Timer/Counter 记录指标
  • Prometheus 通过 /actuator/prometheus 端点采集指标,配置 15s 采集间隔
  • Grafana 仪表盘可视化 SQL 耗时热力图、慢查询趋势、连接池使用率等关键指标
  • 告警规则设置慢查询 > 10/min、错误率 > 1%、连接池使用率 > 80% 等阈值
  • SQL 标签中禁止包含完整 SQL 文本和参数值,防止敏感数据泄露;监控拦截器本身有 1%-3% 性能开销

存放路径:D:\git2\jwdev\articles\MYBATIS\专家\生产环境最佳实践\监控与告警.md

📝 发现内容有误?点击此处直接编辑

← 上一篇 热部署与配置刷新
下一篇 → 线程安全与并发
想查看更多题目和详细解析?
小程序提供完整的题库、模拟考试和详细解析
马上就来

长按或扫描二维码,立即体验

扫码体验小程序
马上就来
使用微信扫描二维码
立即体验完整题库