我是靠谱客的博主 温柔砖头,最近开发中收集的这篇文章主要介绍canal prometheus性能监控,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述


canal prometheus性能监控

            

官网:https://github.com/alibaba/canal/wiki/Prometheus-QuickStart

          

              

******************

相关操作

             

创建 mysql

docker run -it -d --net fixed3 --ip 192.168.57.2 --privileged=true 
--name mysql -e  MYSQL_ROOT_PASSWORD=123456 mysql


# 创建用户、并授权
mysql> create user canal identified with mysql_native_password by "123456";
Query OK, 0 rows affected (0.01 sec)

mysql> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
 value int not null);
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

            

创建 canal server

docker run -it -d --net fixed3 --ip 192.168.57.3 
-p 11111:11111 --name canal-server 
-e canal.instance.master.address=192.168.57.2:3306  
-e canal.instance.dbUsername=canal 
-e canal.instance.dbPassword=123456 canal/canal-server

              

创建 prometheus:拉取指标数据

docker run -it -d --net fixed3 --ip 192.168.57.4 
-v /usr/canal/prom/prometheus.yml:/etc/prometheus/prometheus.yml 
--name prom prom/prometheus


*****************
prometheus.yml:配置文件

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['192.168.57.4:9090']

# 添加canal serer指标任务
  - job_name: 'canal'
    static_configs:
    - targets: ['192.168.57.3:11112']

               

创建 grafana:可视化展示数据

docker run -it -d --net fixed3 --ip 192.168.57.5 --name grafana grafana/grafana

                

                       

******************

数据展示

              

添加数据源

                         

               

数据源配置:prometheus

                  

 导入dashboard模板:canal/conf/metrics/Canal_instances_tmpl.json

点击 import

将json数据 复制到import via panel json框后,点击load

             

选择数据源prometheus

                 

数据展示

说明:由于没有创建canal client,所以client部分显示no data 

              

                  

******************

指标数据说明

             

原始指标数据

canal_instance:instance基本信息
canal_instance_transactions:instance接收的transaction数
canal_instance_subscriptions:instance接受的订阅数

# parser基本指标
canal_instance_parser_mode:instance解析模式(是否开启parallel解析)
canal_instance_publish_blocking_time:dump线程提交到异步解析队列的阻塞时间(仅parallel解析模式)
canal_instance_received_binlog_bytes:instance接收binlog字节数

# sink基本指标
canal_instance_sink_blocking_time:sink线程put数据至store的阻塞时间

# store基本指标
canal_instance_store:instance store基本信息
canal_instance_store_produce_seq:instance store接收到的events sequence number
canal_instance_store_produce_mem:instance store接收到的所有events占用内存总量
canal_instance_store_consume_seq:instance store成功消费的events sequence number
canal_instance_store_consume_mem:instance store成功消费的所有events占用内存总量

# client基本指标
canal_instance_client_packets:instance client请求次数的计数
canal_instance_client_bytes:向instance client发送数据包的字节数
canal_instance_client_empty_batches:向instance client发送get接口的空结果数
canal_instance_client_request_error:instance client请求失败计数
canal_instance_client_request_latency:instance client请求的响应延时


# put、get、ack基本指标
canal_instance_put_rows:store put操作完成的table rows
canal_instance_get_rows:client get请求返回的table rows
canal_instance_ack_rows:client ack操作释放的table rows


# delay基本指标
canal_instance_traffic_delay:canal server与MySQL master的延时
canal_instance_put_delay:store put操作events的延时
canal_instance_get_delay:client get请求返回events的延时
canal_instance_ack_delay:client ack操作释放events的延时

               

******************

监控展示指标

              

basics:基本指标

destination:监控的instance name
parallel parser:event parser解析模式
batch mode:instance存储模式,ITEMSIZE、MEMSIZE
buffer size:存储数或者存储大小

# 存储数为num
batch mode: ITEMSIZE
buffer size:num

# 存储大小为:num * buffer.memunit(默认为1kb)
batch mode: MEMSIZE
buffer size:num

              

network bandwidth:instance输入、输出网络带宽

# dump线程读取binlog占用带宽
inbound:rate(canal_instance_received_binlog_bytes{destination="example"}[2m]) / 1024

# 向client发送数据占用带宽
outbound:rate(canal_instance_client_bytes{destination="example"}[2m]) / 1024

              

delay:延时指标

master:canal server与MySQL master之间的延时,canal_instance_traffic_delay{destination="example"} / 1000

put:store put操作延时,canal_instance_put_delay{destination="example"} / 1000
get:cleint get操作延时,canal_instance_get_delay{destination="example"} / 1000
ack:client ack操作延时,canal_instance_ack_delay{destination="example"} / 1000

              

blocking:阻塞指标

dump:dump线程blocking时间占比,clamp_max(rate(canal_instance_sink_blocking_time{destination="example"}[2m]), 1000) / 10
sink:sink线程blocking时间占比,clamp_max(rate(canal_instance_publish_blocking_time{destination="example"}[2m]), 1000) / 10

出现下述情况,则原因可能为:
dump blocking ratio、sink blocking ratio占比高:client消费速度慢
dump blocking ratio占比高、sink blocking ratio占比低:canal server parser解析数据慢

               

TPS(table rows):行记录put、get、ack指标

put:store put行记录指标,rate(canal_instance_put_rows{destination="example"}[2m])
get:client get行记录指标,rate(canal_instance_get_rows{destination="example"}[2m])
ack:client ack行记录指标,rate(canal_instance_ack_rows{destination="example"}[2m])

                 

TPS(MySQL Transaction):canal server处理transaction指标

transactions:canal server处理transactions指标,rate(canal_instance_transactions{destination="example"}[2m])

                 

client requests:客户端请求指标

clientack:client ack数,canal_instance_client_packets{destination="example", packetType="CLIENTACK"}
get:client get数,canal_instance_client_packets{destination="example", packetType="GET"}
subscription:客户端订阅数,canal_instance_subscriptions{destination="example",}

                

client qps:客户端每秒操作数

get:client每秒get数,rate(canal_instance_client_packets{destination="example",packetType="GET"}[2m])
ack:cleint每秒ack数,rate(canal_instance_client_packets{destination="example",packetType="CLIENTACK"}[2m])

                   

empty packets:客户端空包指标

empty:每秒空包数,rate(canal_instance_client_empty_batches{destination="example"}[2m])
noempty:每秒非空包数,rate(canal_instance_client_packets{destination="example", packetType="GET"}[2m])

                

response time:客户端请求响应时间指标

                      

                

store remain events:内存中未ack的event数

events:内存中未ack的event数,canal_instance_store_produce_seq{destination="example"} - canal_instance_store_consume_seq{destination="example"}

                   

store remain mem:内存中未ack的数据内存占用(kb)

memsize:内存中未ack的数据内存占用,(canal_instance_store_produce_mem{destination="example"} - canal_instance_store_consume_mem{destination="example"}) / 1024

                  

                     

最后

以上就是温柔砖头为你收集整理的canal prometheus性能监控的全部内容,希望文章能够帮你解决canal prometheus性能监控所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(62)

评论列表共有 0 条评论

立即
投稿
返回
顶部