概述
canal prometheus性能监控
官网:https://github.com/alibaba/canal/wiki/Prometheus-QuickStart
******************
相关操作
创建 mysql
docker run -it -d --net fixed3 --ip 192.168.57.2 --privileged=true
--name mysql -e MYSQL_ROOT_PASSWORD=123456 mysql
# 创建用户、并授权
mysql> create user canal identified with mysql_native_password by "123456";
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
value int not null);
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
创建 canal server
docker run -it -d --net fixed3 --ip 192.168.57.3
-p 11111:11111 --name canal-server
-e canal.instance.master.address=192.168.57.2:3306
-e canal.instance.dbUsername=canal
-e canal.instance.dbPassword=123456 canal/canal-server
创建 prometheus:拉取指标数据
docker run -it -d --net fixed3 --ip 192.168.57.4
-v /usr/canal/prom/prometheus.yml:/etc/prometheus/prometheus.yml
--name prom prom/prometheus
*****************
prometheus.yml:配置文件
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['192.168.57.4:9090']
# 添加canal serer指标任务
- job_name: 'canal'
static_configs:
- targets: ['192.168.57.3:11112']
创建 grafana:可视化展示数据
docker run -it -d --net fixed3 --ip 192.168.57.5 --name grafana grafana/grafana
******************
数据展示
添加数据源
数据源配置:prometheus
导入dashboard模板:canal/conf/metrics/Canal_instances_tmpl.json
点击 import
将json数据 复制到import via panel json框后,点击load
选择数据源prometheus
数据展示
说明:由于没有创建canal client,所以client部分显示no data
******************
指标数据说明
原始指标数据
canal_instance:instance基本信息
canal_instance_transactions:instance接收的transaction数
canal_instance_subscriptions:instance接受的订阅数
# parser基本指标
canal_instance_parser_mode:instance解析模式(是否开启parallel解析)
canal_instance_publish_blocking_time:dump线程提交到异步解析队列的阻塞时间(仅parallel解析模式)
canal_instance_received_binlog_bytes:instance接收binlog字节数
# sink基本指标
canal_instance_sink_blocking_time:sink线程put数据至store的阻塞时间
# store基本指标
canal_instance_store:instance store基本信息
canal_instance_store_produce_seq:instance store接收到的events sequence number
canal_instance_store_produce_mem:instance store接收到的所有events占用内存总量
canal_instance_store_consume_seq:instance store成功消费的events sequence number
canal_instance_store_consume_mem:instance store成功消费的所有events占用内存总量
# client基本指标
canal_instance_client_packets:instance client请求次数的计数
canal_instance_client_bytes:向instance client发送数据包的字节数
canal_instance_client_empty_batches:向instance client发送get接口的空结果数
canal_instance_client_request_error:instance client请求失败计数
canal_instance_client_request_latency:instance client请求的响应延时
# put、get、ack基本指标
canal_instance_put_rows:store put操作完成的table rows
canal_instance_get_rows:client get请求返回的table rows
canal_instance_ack_rows:client ack操作释放的table rows
# delay基本指标
canal_instance_traffic_delay:canal server与MySQL master的延时
canal_instance_put_delay:store put操作events的延时
canal_instance_get_delay:client get请求返回events的延时
canal_instance_ack_delay:client ack操作释放events的延时
******************
监控展示指标
basics:基本指标
destination:监控的instance name
parallel parser:event parser解析模式
batch mode:instance存储模式,ITEMSIZE、MEMSIZE
buffer size:存储数或者存储大小
# 存储数为num
batch mode: ITEMSIZE
buffer size:num
# 存储大小为:num * buffer.memunit(默认为1kb)
batch mode: MEMSIZE
buffer size:num
network bandwidth:instance输入、输出网络带宽
# dump线程读取binlog占用带宽
inbound:rate(canal_instance_received_binlog_bytes{destination="example"}[2m]) / 1024
# 向client发送数据占用带宽
outbound:rate(canal_instance_client_bytes{destination="example"}[2m]) / 1024
delay:延时指标
master:canal server与MySQL master之间的延时,canal_instance_traffic_delay{destination="example"} / 1000
put:store put操作延时,canal_instance_put_delay{destination="example"} / 1000
get:cleint get操作延时,canal_instance_get_delay{destination="example"} / 1000
ack:client ack操作延时,canal_instance_ack_delay{destination="example"} / 1000
blocking:阻塞指标
dump:dump线程blocking时间占比,clamp_max(rate(canal_instance_sink_blocking_time{destination="example"}[2m]), 1000) / 10
sink:sink线程blocking时间占比,clamp_max(rate(canal_instance_publish_blocking_time{destination="example"}[2m]), 1000) / 10
出现下述情况,则原因可能为:
dump blocking ratio、sink blocking ratio占比高:client消费速度慢
dump blocking ratio占比高、sink blocking ratio占比低:canal server parser解析数据慢
TPS(table rows):行记录put、get、ack指标
put:store put行记录指标,rate(canal_instance_put_rows{destination="example"}[2m])
get:client get行记录指标,rate(canal_instance_get_rows{destination="example"}[2m])
ack:client ack行记录指标,rate(canal_instance_ack_rows{destination="example"}[2m])
TPS(MySQL Transaction):canal server处理transaction指标
transactions:canal server处理transactions指标,rate(canal_instance_transactions{destination="example"}[2m])
client requests:客户端请求指标
clientack:client ack数,canal_instance_client_packets{destination="example", packetType="CLIENTACK"}
get:client get数,canal_instance_client_packets{destination="example", packetType="GET"}
subscription:客户端订阅数,canal_instance_subscriptions{destination="example",}
client qps:客户端每秒操作数
get:client每秒get数,rate(canal_instance_client_packets{destination="example",packetType="GET"}[2m])
ack:cleint每秒ack数,rate(canal_instance_client_packets{destination="example",packetType="CLIENTACK"}[2m])
empty packets:客户端空包指标
empty:每秒空包数,rate(canal_instance_client_empty_batches{destination="example"}[2m])
noempty:每秒非空包数,rate(canal_instance_client_packets{destination="example", packetType="GET"}[2m])
response time:客户端请求响应时间指标
store remain events:内存中未ack的event数
events:内存中未ack的event数,canal_instance_store_produce_seq{destination="example"} - canal_instance_store_consume_seq{destination="example"}
store remain mem:内存中未ack的数据内存占用(kb)
memsize:内存中未ack的数据内存占用,(canal_instance_store_produce_mem{destination="example"} - canal_instance_store_consume_mem{destination="example"}) / 1024
最后
以上就是温柔砖头为你收集整理的canal prometheus性能监控的全部内容,希望文章能够帮你解决canal prometheus性能监控所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复