关于HA的心跳与Ping

222 阅读 0 评论 147 点赞

我是靠谱客的博主贪玩皮皮虾，这篇文章主要介绍关于HA的心跳与Ping，现在分享给大家，希望可以做个参考。

个人理解：
一、关于HA的心跳与Ping：
心跳：检测节点是否间是否存活。
ping : 检测节点的其他接口网络是否正常，如果被ping的IP不能，则表示该接口出现问题。
心跳+Ping ：检测心跳连接与其他接口是否出现问题。
ping group : 用于侦测ipfail的地址，这应该是个常用固定地址，如网关，建立设置多个！防止被ping的设置出现故障！
当ping group里的IP全部不能被ping通时：，
==>如果一方能通，一方不通时HA将会failover
==>如果两方都不能，HA不会failover
二、测试环境中可能出现的问题：
a、使用虚拟网络作为心跳网络时，断开心跳的节点网卡，应用不会切换。
解决方法：尝试将心跳网络桥接到本地网卡。
原因：可能是使用了虚拟网络，节点间发送的包在接收与发送上有问题.
同理，有时候使用了虚拟网络，节点断了，也不能切！
HA的测试最好是使用真实网卡，搞虚拟网卡，这些那些问题真让人很难相信这是高可用啊！
三、HA脑裂开始与恢复过程
a、断开心跳线，开始脑裂了。
b、插上网线, 通过HA的状态检测，双方的节点会出现对方节点存活了。
c、资源管理器会释放资源，关闭应用。
d、关闭浮动IP
e、重启heartbeat服务
f、ha的节点相互检测状态，发现双方都正常。
g、取得资源、VIP、启应用
=============================================================
四、双心跳的测试
HA1 HA2
eth0 <==心跳==> eth0 <vmware bridge 1>
eth1 <==心跳==> eth1 <vmware bridge 2>
eth2 <==内网==> eth2 <hostonly>
# tail -2 /etc/ha.d/haresources
ha1.example.com IPaddr::2.2.2.100/24/eth2:0 httpd
ha2.example.com IPaddr::2.2.2.101/24/eth2:1 vsftpd
[root@ha1 ~]# ip addr | grep eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 2.2.2.10/24 brd 2.2.2.255 scope global eth2
inet 2.2.2.100/24 brd 2.2.2.255 scope global secondary eth2:0
[root@ha2 ~]# ip addr | grep eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 2.2.2.20/24 brd 2.2.2.255 scope global eth2
inet 2.2.2.101/24 brd 2.2.2.255 scope global secondary eth2:0
这只是在原有的基础上，增加一根心跳线！关于这个，我认为，断一根心跳线不会让HA出现脑裂。
测试项目：
a、两机正常的情况下，断开节点ha2的eth1网线。
结果：
ha2的日志里提示ha1的网卡eth1死掉了！但对应用访问没有影响！从日志里可以看到没有VIP抢夺的过程。
heartbeat[7696]: 2011/07/26_17:42:48 info: Link ha2.example.com:eth1 dead.
ipfail[7723]: 2011/07/26_17:42:48 info: Link Status update: Link ha2.example.com/eth1 now has status dead
ipfail[7723]: 2011/07/26_17:42:51 info: Asking other side for ping node count.
ipfail[7723]: 2011/07/26_17:42:51 info: Checking remote count of ping nodes.
ipfail[7723]: 2011/07/26_17:42:53 info: Ping node count is balanced.
ipfail[7723]: 2011/07/26_17:42:54 info: No giveup timer to abort.
b、在a的情况下，再拔掉eth2，这个是内网节点的网卡，断开应该直接会切的。
结果：应用已经成功切换了。
日志详情：
ipfail[6570]: 2011/07/26_17:49:47 info: Telling other node that we have more visible ping nodes.
heartbeat[6543]: 2011/07/26_17:49:55 info: ha1.example.com wants to go standby [all]
heartbeat[6543]: 2011/07/26_17:50:00 info: standby: acquire [all] resources from ha1.example.com
heartbeat[7021]: 2011/07/26_17:50:00 info: acquire all HA resources (standby).
ResourceManager[7034]: 2011/07/26_17:50:00 info: Acquiring resource group: ha1.example.com IPaddr::2.2.2.100/24/eth2:0 httpd
IPaddr[7061]: 2011/07/26_17:50:00 INFO: Resource is stopped
ResourceManager[7034]: 2011/07/26_17:50:00 info: Running /etc/ha.d/resource.d/IPaddr 2.2.2.100/24/eth2:0 start
IPaddr[7159]: 2011/07/26_17:50:00 INFO: Using calculated netmask for 2.2.2.100: 255.255.255.0
IPaddr[7159]: 2011/07/26_17:50:01 INFO: eval ifconfig eth2:1 2.2.2.100 netmask 255.255.255.0 broadcast 2.2.2.255
IPaddr[7130]: 2011/07/26_17:50:01 INFO: Success
ResourceManager[7034]: 2011/07/26_17:50:01 info: Running /etc/init.d/httpd start
ResourceManager[7300]: 2011/07/26_17:50:01 info: Acquiring resource group: ha2.example.com IPaddr::2.2.2.101/24/eth2:1 vsftpd
IPaddr[7328]: 2011/07/26_17:50:01 INFO: Running OK
heartbeat[7021]: 2011/07/26_17:50:02 info: all HA resource acquisition completed (standby).
heartbeat[6543]: 2011/07/26_17:50:02 info: Standby resource acquisition done [all].
heartbeat[6543]: 2011/07/26_17:50:03 info: remote resource transition completed.
[root@ha2 ~]# ip addr | grep eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 2.2.2.20/24 brd 2.2.2.255 scope global eth2
inet 2.2.2.101/24 brd 2.2.2.255 scope global secondary eth2:0
inet 2.2.2.100/24 brd 2.2.2.255 scope global secondary eth2:1
c、因为b的环境中，已经断了两块网块。
==>如果ha1中网卡eht2恢复连接了，如果你的ha.cf启用了auto_failback，那么此时应用就回来了！
==>如果ha1与ha2的我两根心跳线全断，那就开始脑裂了。
==>如果ha1.ha2心跳线间失去连接后，又恢复连接了。这时候，双方开始重启hearbeat!
========================================================