corosync.conf修改bindnetaddr后不生效问题

笔者最近在配置pacemaker+corosync时，尝试去修改corosync.conf文件中的bindnetaddr的地址（想改成10.0.0.0），但是执行过后，通过命令查看：

corosync-cfgtool -s

发现配置未生效，heartbeat 仍然选择原IP地址（192.168.122.0）进行心跳检测。

通过查看相关日志记录，并没有明显错误：

[root@node1 ~] tailf -n 100 /var/log/cluster/corosync.log
...
[2289] node2 corosyncnotice  [TOTEM ] A new membership (192.168.122.60:316) was formed. Members joined: 1
[2289] node2 corosyncnotice  [QUORUM] Members[3]: 1 3 2
[2289] node2 corosyncnotice  [MAIN  ] Completed service synchronization, ready to provide service.
[2289] node2 corosyncnotice  [MAIN  ] Node was shut down by a signal
[2289] node2 corosyncnotice  [SERV  ] Unloading all Corosync service engines.
[2289] node2 corosyncinfo    [QB    ] withdrawing server sockets
[2289] node2 corosyncnotice  [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
[2289] node2 corosyncinfo    [QB    ] withdrawing server sockets
[2289] node2 corosyncnotice  [SERV  ] Service engine unloaded: corosync configuration map access
[2289] node2 corosyncinfo    [QB    ] withdrawing server sockets
[2289] node2 corosyncnotice  [SERV  ] Service engine unloaded: corosync configuration service
[2289] node2 corosyncinfo    [QB    ] withdrawing server sockets
[2289] node2 corosyncnotice  [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
[2289] node2 corosyncinfo    [QB    ] withdrawing server sockets
[2289] node2 corosyncnotice  [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
[2289] node2 corosyncnotice  [SERV  ] Service engine unloaded: corosync profile loading service
[2289] node2 corosyncnotice  [MAIN  ] Corosync Cluster Engine exiting normally
[18475] node2 corosyncnotice  [MAIN  ] Corosync Cluster Engine ('2.4.3'): started and ready to provide service.
[18475] node2 corosyncinfo    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp libcgroup pie relro bindnow
[18475] node2 corosyncwarning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
[18475] node2 corosyncwarning [MAIN  ] Please migrate config file to nodelist.
[18475] node2 corosyncnotice  [TOTEM ] Initializing transport (UDP/IP Unicast).
[18475] node2 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
[18475] node2 corosyncnotice  [TOTEM ] The network interface [192.168.122.117] is now up.
[18475] node2 corosyncnotice  [SERV  ] Service engine loaded: corosync configuration map access [0]
[18475] node2 corosyncinfo    [QB    ] server name: cmap
[18475] node2 corosyncnotice  [SERV  ] Service engine loaded: corosync configuration service [1]
[18475] node2 corosyncinfo    [QB    ] server name: cfg
[18475] node2 corosyncnotice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
[18475] node2 corosyncinfo    [QB    ] server name: cpg
[18475] node2 corosyncnotice  [SERV  ] Service engine loaded: corosync profile loading service [4]
[18475] node2 corosyncnotice  [QUORUM] Using quorum provider corosync_votequorum
[18475] node2 corosyncnotice  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
[18475] node2 corosyncinfo    [QB    ] server name: votequorum
[18475] node2 corosyncnotice  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
[18475] node2 corosyncinfo    [QB    ] server name: quorum
[18475] node2 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.122.60}
[18475] node2 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.122.117}
[18475] node2 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.122.114}
[18475] node2 corosyncnotice  [TOTEM ] A new membership (192.168.122.117:320) was formed. Members joined: 2
[18475] node2 corosyncnotice  [TOTEM ] A new membership (192.168.122.60:324) was formed. Members joined: 1
[18475] node2 corosyncnotice  [QUORUM] This node is within the primary component and will provide service.
[18475] node2 corosyncnotice  [QUORUM] Members[2]: 1 2
[18475] node2 corosyncnotice  [MAIN  ] Completed service synchronization, ready to provide service.
[18475] node2 corosyncnotice  [TOTEM ] A new membership (192.168.122.60:328) was formed. Members joined: 3
[18475] node2 corosyncwarning [CPG   ] downlist left_list: 0 received in state 0
[18475] node2 corosyncnotice  [QUORUM] Members[3]: 1 3 2
[18475] node2 corosyncnotice  [MAIN  ] Completed service synchronization, ready to provide service.
...

通过对比配置文件，也没有拼写错误：

[root@node1 ~]# egrep -v '(#|^$)' /etc/corosync/corosync.conf.example
totem {
	version: 2
	crypto_cipher: none
	crypto_hash: none
	interface {
		ringnumber: 0
		bindnetaddr: 192.168.1.0
		mcastaddr: 239.255.1.1
		mcastport: 5405
		ttl: 1
	}
}
logging {
	fileline: off
	to_stderr: no
	to_logfile: yes
	logfile: /var/log/cluster/corosync.log
	to_syslog: yes
	debug: off
	timestamp: on
	logger_subsys {
		subsys: QUORUM
		debug: off
	}
}
quorum {
}

检查网络、hosts文件（均添加了相关的节点ip）后，都没有问题，在一筹莫展的时候，试了下修改nodelist部分中的IP地址：

[root@node2 corosync]# vim corosync.conf

...

    node {
        ring0_addr: 10.0.0.10
        nodeid: 1
    }
    node {
        ring0_addr: 10.0.0.20
        nodeid: 2
    }

    node {
        ring0_addr: 10.0.0.30
        nodeid: 3
    }
}

...

执行命令pcs cluster sync 出错：

[root@node1 corosync]# pcs cluster sync
10.0.0.10: {"notauthorized":"true"}
Unable to authenticate to 10.0.0.10 - (HTTP error: 401), try running 'pcs cluster auth'
Error: Unable to set corosync config: Unable to authenticate to 10.0.0.10 - (HTTP error: 401), try running 'pcs cluster auth'

原因找到了，原来新增的IP地址没有做pcs auth，执行命令并查看，问题解决

[root@node1 corosync]# pcs cluster auth 10.0.0.10 10.0.0.20 10.0.0.30
Username: hacluster
Password: 
10.0.0.30: Authorized
10.0.0.20: Authorized
10.0.0.10: Authorized
[root@node1 corosync]# pcs cluster sync
10.0.0.10: Succeeded
10.0.0.20: Succeeded
10.0.0.30: Succeeded
[root@node1 corosync]# pcs cluster start --all
10.0.0.10: Starting Cluster (corosync)...
10.0.0.20: Starting Cluster (corosync)...
10.0.0.30: Starting Cluster (corosync)...
10.0.0.30: Starting Cluster (pacemaker)...
10.0.0.10: Starting Cluster (pacemaker)...
10.0.0.20: Starting Cluster (pacemaker)...
[root@node1 corosync]# corosync-cfgtool -s 
Printing ring status.
Local node ID 1
RING ID 0
	id	= 10.0.0.10
	status	= ring 0 active with no faults

最后总结一下：

1、如果是新增一类heartbeat 的IP地址，需要在conf配置文件中添加：

rrp_mode:active

此时不需要对nodelist部分进行修改或者新增pcs auth，原有的IP地址就能够更新和发现节点，当2个IP网卡均宕掉后，fence机制会启动，节点重启。

2、如果是替换heartbeat的IP地址，需要对新地址重新pcs auth，这会导致pacemaker发现的节点数double（比如原来发现了3个节点，这下子又加了3个，但其实是一台机器的2个不同IP而已），这里笔者未做深入研究，因此建议还是在规划的时候就考虑好IP规划。

corosync.conf修改bindnetaddr后不生效问题

猜你喜欢