版权声明:欢迎转载,但是请标明文章作者和出处。 https://blog.csdn.net/handsomehuo/article/details/90607455
笔者最近在配置pacemaker+corosync时,尝试去修改corosync.conf文件中的bindnetaddr的地址(想改成10.0.0.0),但是执行过后,通过命令查看:
corosync-cfgtool -s
发现配置未生效,heartbeat 仍然选择原IP地址(192.168.122.0)进行心跳检测。
通过查看相关日志记录,并没有明显错误:
[root@node1 ~] tailf -n 100 /var/log/cluster/corosync.log
...
[2289] node2 corosyncnotice [TOTEM ] A new membership (192.168.122.60:316) was formed. Members joined: 1
[2289] node2 corosyncnotice [QUORUM] Members[3]: 1 3 2
[2289] node2 corosyncnotice [MAIN ] Completed service synchronization, ready to provide service.
[2289] node2 corosyncnotice [MAIN ] Node was shut down by a signal
[2289] node2 corosyncnotice [SERV ] Unloading all Corosync service engines.
[2289] node2 corosyncinfo [QB ] withdrawing server sockets
[2289] node2 corosyncnotice [SERV ] Service engine unloaded: corosync vote quorum service v1.0
[2289] node2 corosyncinfo [QB ] withdrawing server sockets
[2289] node2 corosyncnotice [SERV ] Service engine unloaded: corosync configuration map access
[2289] node2 corosyncinfo [QB ] withdrawing server sockets
[2289] node2 corosyncnotice [SERV ] Service engine unloaded: corosync configuration service
[2289] node2 corosyncinfo [QB ] withdrawing server sockets
[2289] node2 corosyncnotice [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
[2289] node2 corosyncinfo [QB ] withdrawing server sockets
[2289] node2 corosyncnotice [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
[2289] node2 corosyncnotice [SERV ] Service engine unloaded: corosync profile loading service
[2289] node2 corosyncnotice [MAIN ] Corosync Cluster Engine exiting normally
[18475] node2 corosyncnotice [MAIN ] Corosync Cluster Engine ('2.4.3'): started and ready to provide service.
[18475] node2 corosyncinfo [MAIN ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp libcgroup pie relro bindnow
[18475] node2 corosyncwarning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
[18475] node2 corosyncwarning [MAIN ] Please migrate config file to nodelist.
[18475] node2 corosyncnotice [TOTEM ] Initializing transport (UDP/IP Unicast).
[18475] node2 corosyncnotice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
[18475] node2 corosyncnotice [TOTEM ] The network interface [192.168.122.117] is now up.
[18475] node2 corosyncnotice [SERV ] Service engine loaded: corosync configuration map access [0]
[18475] node2 corosyncinfo [QB ] server name: cmap
[18475] node2 corosyncnotice [SERV ] Service engine loaded: corosync configuration service [1]
[18475] node2 corosyncinfo [QB ] server name: cfg
[18475] node2 corosyncnotice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
[18475] node2 corosyncinfo [QB ] server name: cpg
[18475] node2 corosyncnotice [SERV ] Service engine loaded: corosync profile loading service [4]
[18475] node2 corosyncnotice [QUORUM] Using quorum provider corosync_votequorum
[18475] node2 corosyncnotice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
[18475] node2 corosyncinfo [QB ] server name: votequorum
[18475] node2 corosyncnotice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
[18475] node2 corosyncinfo [QB ] server name: quorum
[18475] node2 corosyncnotice [TOTEM ] adding new UDPU member {192.168.122.60}
[18475] node2 corosyncnotice [TOTEM ] adding new UDPU member {192.168.122.117}
[18475] node2 corosyncnotice [TOTEM ] adding new UDPU member {192.168.122.114}
[18475] node2 corosyncnotice [TOTEM ] A new membership (192.168.122.117:320) was formed. Members joined: 2
[18475] node2 corosyncnotice [TOTEM ] A new membership (192.168.122.60:324) was formed. Members joined: 1
[18475] node2 corosyncnotice [QUORUM] This node is within the primary component and will provide service.
[18475] node2 corosyncnotice [QUORUM] Members[2]: 1 2
[18475] node2 corosyncnotice [MAIN ] Completed service synchronization, ready to provide service.
[18475] node2 corosyncnotice [TOTEM ] A new membership (192.168.122.60:328) was formed. Members joined: 3
[18475] node2 corosyncwarning [CPG ] downlist left_list: 0 received in state 0
[18475] node2 corosyncnotice [QUORUM] Members[3]: 1 3 2
[18475] node2 corosyncnotice [MAIN ] Completed service synchronization, ready to provide service.
...
通过对比配置文件,也没有拼写错误:
[root@node1 ~]# egrep -v '(#|^$)' /etc/corosync/corosync.conf.example
totem {
version: 2
crypto_cipher: none
crypto_hash: none
interface {
ringnumber: 0
bindnetaddr: 192.168.1.0
mcastaddr: 239.255.1.1
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
}
检查网络、hosts文件(均添加了相关的节点ip)后,都没有问题,在一筹莫展的时候,试了下修改nodelist部分中的IP地址:
[root@node2 corosync]# vim corosync.conf
...
node {
ring0_addr: 10.0.0.10
nodeid: 1
}
node {
ring0_addr: 10.0.0.20
nodeid: 2
}
node {
ring0_addr: 10.0.0.30
nodeid: 3
}
}
...
执行命令pcs cluster sync 出错:
[root@node1 corosync]# pcs cluster sync
10.0.0.10: {"notauthorized":"true"}
Unable to authenticate to 10.0.0.10 - (HTTP error: 401), try running 'pcs cluster auth'
Error: Unable to set corosync config: Unable to authenticate to 10.0.0.10 - (HTTP error: 401), try running 'pcs cluster auth'
原因找到了,原来新增的IP地址没有做pcs auth,执行命令并查看,问题解决
[root@node1 corosync]# pcs cluster auth 10.0.0.10 10.0.0.20 10.0.0.30
Username: hacluster
Password:
10.0.0.30: Authorized
10.0.0.20: Authorized
10.0.0.10: Authorized
[root@node1 corosync]# pcs cluster sync
10.0.0.10: Succeeded
10.0.0.20: Succeeded
10.0.0.30: Succeeded
[root@node1 corosync]# pcs cluster start --all
10.0.0.10: Starting Cluster (corosync)...
10.0.0.20: Starting Cluster (corosync)...
10.0.0.30: Starting Cluster (corosync)...
10.0.0.30: Starting Cluster (pacemaker)...
10.0.0.10: Starting Cluster (pacemaker)...
10.0.0.20: Starting Cluster (pacemaker)...
[root@node1 corosync]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 10.0.0.10
status = ring 0 active with no faults
最后总结一下:
1、如果是新增一类heartbeat 的IP地址,需要在conf配置文件中添加:
rrp_mode:active
此时不需要对nodelist部分进行修改或者新增pcs auth,原有的IP地址就能够更新和发现节点,当2个IP网卡均宕掉后,fence机制会启动,节点重启。
2、如果是替换heartbeat的IP地址,需要对新地址重新pcs auth,这会导致pacemaker发现的节点数double(比如原来发现了3个节点,这下子又加了3个,但其实是一台机器的2个不同IP而已),这里笔者未做深入研究,因此建议还是在规划的时候就考虑好IP规划。