MySQL 主从复制 Group Replication & InnoDB Cluster 总结

89 阅读 0 评论 59 点赞

我是靠谱客的博主朴实树叶，这篇文章主要介绍MySQL 主从复制 Group Replication & InnoDB Cluster 总结，现在分享给大家，希望可以做个参考。

MySQL 部署的方式：

1 master + n slave 传统的异步同步方式
MySQL8 Group Replication + InnoDB Cluster(MySQL Shell) + MySQL Router8 高可用部署（最多9个节点, mysql8）
k8s 中心集群 kubeDB 部署
其实也可以使用 InnoDB Cluster + 传统主从复制，使可读节点突破最多9个节点的限制，同时又可以让 master 支持高可用。

关于读写分离：
如果使用的是 mysql8 X protocol, 只能用 mysql router，否则可以尝试 proxysql.

关于 InnoDB Cluster

InnoDB Cluster 必须使用 MySQL Shell 配置，也通过 MySQL Shell 管理节点: ref link

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1. 安装 MySQL Shell & MySQL Router
2. 通过 docker-compose 启动一组 mysql (至少3台，并且在同一个 docke network 中)
3. 假设 mysql docker 内的 ip 地址为 172.30.0.50，172.30.0.51，docker 内组复制的端口为 33061
4. mysql 端口 mapping: master的端口为 17570:3306, slave为 17572:3306
shell.connect('root@localhost:17570') # mysql shell 连接上master
var cluster = dba.createCluster('myCluster', {'localAddress':'172.30.0.50:33061'});	# 创建一个 cluster
cluster.addInstance('root@127.0.0.1:17572',{'localAddress':'172.30.0.51:33061'});
# slave 加入 cluster
# 登录其中一个mysql并查看
cluster.status();
# or
SELECT * FROM performance_schema.replication_group_members;
# 如果是获取一个已经在运行的cluster
var cluster = dba.getCluster('myCluster');
mysqlrouter --bootstrap mysql_user@127.0.0.1:17570 --directory /tmp/myrouter/
# bootstrap mysql router, 获取 InnoDB Cluster 信息
# 如果 bootstrap 报目录权限错误，根据提示添加 apparmor 配置，然后reload:
sudo /etc/init.d/apparmor restart
mysqlrouter -c /tmp/myrouter/mysqlrouter.conf
# 启动 mysql router， 现在就可以实现mysql 读写分离，高可用了

如果只是简单使用 Group Replicaton：

Follow 这个link: https://www.alibabacloud.com/blog/how-to-setup-mysql-group-replication-on-ubuntu-16-04_594459

关于配置 Mysql Group Replication 时的注意事项：

如果start group_replication; 报错链接权限error :

复制代码

1
2
[ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master

这是因为 mysql 8默认使用 caching_sha2_password 机密，需要配置公私钥，修改加密方式（或者看下面 GET_MASTER_PUBLIC_KEY=1）：

复制代码

1
2
3
4
5
6
7
8
SET SQL_LOG_BIN=0;
不记录日志
alter USER replication_user@'%' IDENTIFIED WITH sha256_password BY 'PASSWORD';
修改认证方式
GRANT REPLICATION SLAVE ON *.* TO 'replication_user'@'%';
SET SQL_LOG_BIN=1;
记录日志

如果在slave 上 start group_replication 报错：

复制代码

1
2
[ERROR] [MY-011526] [Repl] Plugin group_replication reported: 'This member has more executed transactions than those present in the group.

这是因为 slave 里头有之前的mater数据，需要重置master:

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
SELECT @@global.gtid_executed;
# 复制执行到了哪条语句
STOP GROUP_REPLICATION;
# 停掉 replication
reset master;
# 清空 gtid_executed
CHANGE MASTER TO MASTER_USER='replication_user', MASTER_PASSWORD='PASSWORD' FOR CHANNEL 'group_replication_recovery';
# 重置master, 加上 GET_MASTER_PUBLIC_KEY=1 如果使用 caching_sha2_password RSA 加密策略
SELECT @@global.gtid_executed;
START GROUP_REPLICATION;
# 再次开启复制

Use MariaDB MaxScale for mariadb, mysql router for mysql…

关于 mysql 8 maxscale 的配置：

step1
在 mysql 上创建maxscale账号和相关权限：

复制代码

1
2
3
4
5
6
7
8
CREATE USER 'maxscale'@'maxscalehost' IDENTIFIED WITH mysql_native_password BY 'my-passwd';
GRANT SELECT ON mysql.user TO 'maxscale'@'maxscalehost';
GRANT SELECT ON mysql.db TO 'maxscale'@'maxscalehost';
GRANT SELECT ON mysql.tables_priv TO 'maxscale'@'maxscalehost';
GRANT SHOW DATABASES ON *.* TO 'maxscale'@'maxscalehost';
-- GRANT REPLICATION SLAVE ON *.* TO 'maxscale'@'maxscalehost';
GRANT REPLICATION CLIENT ON *.* TO 'maxscale'@'maxscalehost';

step2
配置你的maxscale 配置文件(my-maxscale.cnf)，然后启动maxscale docker：

复制代码

1
2
docker run -d -p 8989:8989 --name mxs -v $PWD/my-maxscale.cnf:/etc/maxscale.cnf.d/my-maxscale.cnf mariadb/maxscale

除了读写分离， maxscale 还有很多其他插件(router)，可以尝试不同的功能。
注： maxsacle 不支持 mysql 8 的failover 和自动恢复。需要使用 MariaDb 10.0.2 或更高版本：

复制代码

1
2
The version of 'server1' (8.0.18) is not supported. Failover/switchover requires MariaDB 10.0.2 or later.

关于传统异步复制

reset slave 命令和 reset slave all 区别:

注意：reset slave 之前必须先 stop slave，否则报错。

第一、reset slave 命令和 reset slave all 命令会删除所有的 relay log（包括还没有应用完的日志），创建一个新的relay log文件；

step1.清除 master.info,relay-log.info文件(记录)，
step2.删除所有的relay log，包括还没有应用完的日志，创建一个新的relay log文件.
在step1中，也就是清除master.info文件和relay-log.info文件，需要注意的是，新版本的mysql中，这两个文件已经不存在了，而是存储在 mysql 数据库的 slave_master_info 表和 slave_relay_log_info表中.

第二、使用reset slave命令，所有的连接信息仍然保留在内存中，包括主库地址、端口、用户、密码等。这样可以直接运行start slave命令而不必重新输入change master to命令，而运行show slave status也仍和没有运行reset slave一样，有正常的输出。但如果使用reset slave all命令，那么这些内存中的数据也会被清除掉，运行show slave status就输出为空了。

第三、reset slave和reset slave all命令会将系统mysql数据库的slave_master_info表和slave_relay_log_info表中对应的复制记录清除。

复制代码

1
2
reset master:

1、清理所有的binlog文件，创建一个新的文件，起始值从1开始。
2、GTID环境中，reset master会清理掉gtid_executed的值.
当slave 中报错： "Slave has more GTIDs than the master has,using the master’s SERVER_UUID. " 提示我们slave包含太多UUID了，需要指定master的UUID时，我们可以在 slave 上执行 reset mater, 即重置 slave 中的 gtid_executed 值。

ref link: https://cloud.tencent.com/developer/article/1533731

SHOW SLAVE STATUS; 各个参数的含义：

Here are the items

Seconds_Behind_Master : The number of seconds behind the slave is from the master based on the time in the Slave’s OS minus the TIMESTAMP of the latest query processed on the Slave.
Relay_Log_Space : Total number of bytes collected on the Slave from the Master yet to be processed
(Master_Log_File,Read_Master_Log_Pos) : The latest binlog entry from the Master to be recorded in the Slave’s Relay Logs.
(Relay_Master_Log_File,Exec_Master_Log_Pos) : The latest binlog entry from the Master to be executed in the Slave.
Lag can only be measured, or at least noticed, by observing one or more of the following:

Seconds_Behind_Master keeps increasing
Master_Log_File and Relay_Master_Log_File are not the same (binlog events are being collected faster that the SQL thread can process them)
Exec_Master_Log_Pos is not moving even if Slave_SQL_Running: Yes (This could be a long running query or possibly a very large transaction commit)

Relay_Log_Space is increasing steadily
Indicates amount of binlog events collected
This could mean slow collection of binlog events over the network
I sometimes see lag increasing because of taking a large mysqldump and loading it hours later (even a day later). When starting up replication for the first time, the apparent numerical lag in Seconds would disappear after several hours (or even a day) out of nowhere.

Please note that there is only one SQL thread for replication. All queries are processed sequentially. How does this affect replication lag ???

Suppose the Master executed 300 inserts, updates, and deletes in the same second and each query took 1 second to run. Here is what is actually happening:

All queries are serialized when written into the Master’s binary logs
The Slave reads the binlog events from the Master ONE AT A TIME !!!
The Slave executes the binlog events it recorded ONE AT A TIME !!!
Seconds_Behind_Master would actually increase by 300 seconds (5 minutes)
Keep in mind that the result of transaction could be huge and cannot be applied until all the binlog events making up the one transaction is completely written into the relay logs on the Master (This can usually occur with ROW based replication). This also contributes to increased lag.

Even with the beefiest scaled-up Slave, it will only process one SQL transaction at a time. A beefy Master would make sense if you eventually failover to such server and promote it to a Master.

In conclusion, please keep your eye on these things as the Slave eventually catches up.

link: https://dba.stackexchange.com/questions/160207/mysql-replication-lagging-behind-with-log-slave-updates

1、从库只开启log-bin功能，不添加log-slave-updates参数，从库从主库复制的数据不会写入log-bin日志文件里。

2、直接向从库写入数据时，是会写入log-bin日志的。

3、开启log-slave-updates参数后，从库从主库复制的数据会写入log-bin日志文件里。这也是该参数的功能

注意：从节点配置文件加入log-slave-updates=1，同步异常慢

4、使用 MIXED bin_log format.

关于主从复制时，slave 报错回复的方法：
首先得知道这两个参数的含义：

gtid_executed: it contains a representation of the set of all transaction logged in the binary log
gtid_purged: it contains a representation of the set of all transactions deleted from the binary log

恢复的流程：

take a backup from the master and store the value of gtid_executed
restore the backup on the slave and set gtid_purged with the value of gtid_executed from the master

从master 备份数据：

查看 master 的 GTID:

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
master > show global variables like 'gtid_executed';
+---------------+-------------------------------------------+
| Variable_name | Value
|
+---------------+-------------------------------------------+
| gtid_executed | 9a511b7b-7059-11e2-9a24-08002762b8af:1-13 |
+---------------+-------------------------------------------+
master > show global variables like 'gtid_purged';
+---------------+------------------------------------------+
| Variable_name | Value
|
+---------------+------------------------------------------+
| gtid_purged
| 9a511b7b-7059-11e2-9a24-08002762b8af:1-2 |
+---------------+------------------------------------------+

dump master 数据库：

复制代码

1
2
3
4
mysqldump --all-databases --single-transaction --triggers --routines --host=127.0.0.1 --port=18675 --user=msandbox --password=msandbox > dump.sql
// 指定一个数据库:
mysqldump --databases xxx --single-transaction --triggers --routines --events --host=localhost --port=17560 --user=root --password=root > ./dump.sql

在slave数据库恢复数据：

复制代码

slave1 > show global variables like 'gtid_executed';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_executed |
|
+---------------+-------+
slave1 > show global variables like 'gtid_purged';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_purged
|
|
+---------------+-------+
slave1 > source dump.sql;
[...]
slave1 > show global variables like 'gtid_executed';
+---------------+-------------------------------------------+
| Variable_name | Value
|
+---------------+-------------------------------------------+
| gtid_executed | 9a511b7b-7059-11e2-9a24-08002762b8af:1-13 |
+---------------+-------------------------------------------+
slave1 > show global variables like 'gtid_purged';
+---------------+-------------------------------------------+
| Variable_name | Value
|
+---------------+-------------------------------------------+
| gtid_purged
| 9a511b7b-7059-11e2-9a24-08002762b8af:1-13 |
+---------------+-------------------------------------------+

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
slave1 > show global variables like 'gtid_executed';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_executed |
|
+---------------+-------+
slave1 > show global variables like 'gtid_purged';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_purged
|
|
+---------------+-------+
slave1 > source dump.sql;
[...]
slave1 > show global variables like 'gtid_executed';
+---------------+-------------------------------------------+
| Variable_name | Value
|
+---------------+-------------------------------------------+
| gtid_executed | 9a511b7b-7059-11e2-9a24-08002762b8af:1-13 |
+---------------+-------------------------------------------+
slave1 > show global variables like 'gtid_purged';
+---------------+-------------------------------------------+
| Variable_name | Value
|
+---------------+-------------------------------------------+
| gtid_purged
| 9a511b7b-7059-11e2-9a24-08002762b8af:1-13 |
+---------------+-------------------------------------------+

link: https://www.percona.com/blog/2013/02/08/how-to-createrestore-a-slave-using-gtid-replication-in-mysql-5-6/

总结:
当出现错误：

Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate because the master purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new slave from backup. Consider increasing the master's binary log expiration period. To find the missing transactions, see the master's error log or the manual for GTID_SUBTRACT.'

复制代码

1
2
3
4
5
6
7
# 在master上执行:
show global variables like 'gtid_executed';
# 然后将值设置到 slave的变量
gtid_purged 例如:
set global gtid_purged="dkic82kc-1029-11ea-bf21-0a58ac14830c:1-313";
注意，这样设置好后，master中以前的数据不会自动同步到slave中。所以最好通过 mysqldump 恢复之前的数据再设置。

在开启了 GTID 功能的 MySQL 数据库中, 不论是否使用了 GTID 的方式做了主从同步, 导出导入时都需要特别注意数据库中的 GTID 信息.

导出:

复制代码

1
2
3
 mysqldump -uroot -p userdb > userdb.sql
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events

mysql提示: 当前数据库实例中开启了 GTID 功能, 在开启有 GTID 功能的数据库实例中, 导出其中任何一个库, 如果没有显示地指定 --set-gtid-purged 参数, 都会提示这一行信息. 意思是默认情况下, 导出的库中含有 GTID 信息, 如果不想导出包含有 GTID 信息的数据库, 需要显示地添加 --set-gtid-purged=OFF参数. 于是乎, dump 变成了如下样子

复制代码

1
2
mysqldump -uroot -p --set-gtid-purged=OFF userdb > userdb.sql

使用以上这条命令 dump 出来的库是不包含 GTID 信息的

导入:
导入的时候也分两种, 一种是导入带有 GTID 的信息的库, 一种是导入不带有 GTID 信息的库

不带 GTID 信息:
不带有 GTID 信息的 dump 文件, 不管目标数据库实例是否开启了 GTID 功能, 且不管数据库实例是否已有其他 GTID 信息, 都可以顺利导入

带有 GTID 信息:
带有 GTID 信息的 dump 文件, 要求目标数据库实例必须开启 GTID 功能, 且当前数据库中无其他 GTID 信息. 如果目标数据库中已经记录了一条或一条以上的 GTID 信息, 那么在导入数据库时会报出类似如下的错误:

复制代码

1
2
3
4
mysql -uroot -pmypasswd < dump.sql
Password:xxxxxxxx
ERROR 1840 (HY000) at line 24: @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.

two option:

重新 dump 数据库, 使用 --set-gtid-purged=OFF 参数禁止导出 GTID 信息,再 load 进目标数据库;
在目标数据库中执行mysql> reset slave all; mysql> reset master; 清空所有 GTID 信息之后再导入了

写得太好了，忍不住 copy 过来了： https://docs.lvrui.io/2016/10/28/%E5%BC%80%E5%90%AFGTID%E7%9A%84%E6%83%85%E5%86%B5%E4%B8%8B%E5%AF%BC%E5%87%BA%E5%AF%BC%E5%85%A5%E5%BA%93%E7%9A%84%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A1%B9/

MySQL 通常需要设置的参数：

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[mysqld]
skip-host-cache
skip-name-resolve
datadir = /var/lib/mysql/
general_log = OFF
event_scheduler=ON
bind-address = 0.0.0.0
port
= xxx1
mysqlx_port = xxx2
sql_mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION"
innodb_buffer_pool_size = 4G
innodb_log_file_size = 200M
innodb_flush_log_at_trx_commit = 2
sync_binlog = 50
log-error = error.log
log-bin = mysql-bin.log
gtid_mode=on
enforce_gtid_consistency=on
server_id = 2
binlog_format = MIXED
max_binlog_size = 500M

注意1： mysql 使用 x devapi，默认端口为 33060，可通过 mysqlx_port 参数修改

复制代码

1
2
mysqlx_port = 33061

注意2： mysql x devapi 默认一个session表示一个连接，如果使用 connection pool，即 mysqlx::Client，频繁新建链接和断开连接会对性能有很大影响。建议设置一个 session 超时的时间，保持这个session直到超时，timeout之后再从 connection pool 中新建一个session（连接）。

注意3： MySQL 如果运行在 docker 中，本地应用程序使用 localhost 和 127.0.0.1有区别。
可以使用如下命令让 MySQL container外的应用程序通过 127.0.0.1连接 container内部的MySQL服务：

复制代码

1
2
3
4
GRANT ALL PRIVILEGES on *.* to root@'%' WITH GRANT OPTION;
ALTER user 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'my-passwd';
FLUSH PRIVILEGES;

注意4
如果存在多个数据库，可以修改 slave_parallel_workers， slave 开启多个线程应用relay log

查看复制过程中relog event时间戳

复制代码