一 问题描述
查看集群状态,发现其中一个从节点异常(是fail状态)
127.0.0.1:6383> cluster nodes
83780e418e90de6067a196d88a54fd2cfe719f86 192.168.109.134:6384@16384 slave 6a50a9c515acf3490e5a5256250d857d0812bc6a 0 1615173819865 17 connected
00dd345835790608dc062dd4742d853138f06b97 192.168.109.133:6384@16384 master - 0 1615173819000 9 connected 5462-10923
0e8fa73711c300f2da6f92df49b215d270021b14 192.168.109.134:6383@16383 slave 00dd345835790608dc062dd4742d853138f06b97 0 1615173818862 9 connected
f86464011d9f8ec605857255c0b67cff1e794c19 :0@0 slave,fail,noaddr 2cb35944b4492748a8c739fab63a0e90a56e414a 1614958275942 1614958275942 8 disconnected
6a50a9c515acf3490e5a5256250d857d0812bc6a 192.168.109.132:6384@16384 master - 0 1615173818000 17 connected 10924-16383
2cb35944b4492748a8c739fab63a0e90a56e414a 192.168.109.133:6383@16383 myself,master - 0 1615173819000 8 connected 0-5461
在问题节点上查看节点状态,发现它已脱离集群,且id都已发生了变化:
127.0.0.1:6383> cluster nodes
0cbf44ef3f9c3a8a473bcd303644388782e5ee78 192.168.109.132:6383@16383 myself,master - 0 0 0 connected 0-5461
/*若id没发生变化,直接重启下该从节点就能解决*/
二 解决办法
2.1 将该从节点剔出集群
#在集群每个正常节点上执行cluster forget 故障从节点id,示例:
echo 'cluster forget f86464011d9f8ec605857255c0b67cff1e794c19' | /usr/local/bin/redis-cli -p 6384 -a "密码"
echo 'cluster nodes' | /usr/local/bin/redis-cli -p 6384 -a "密码"
echo 'cluster forget f86464011d9f8ec605857255c0b67cff1e794c19' | /usr/local/bin/redis-cli -p 6383 -a "密码"
echo 'cluster nodes' | /usr/local/bin/redis-cli -p 6383 -a "密码"
2.2 重新将该节点加入集群
2.2.1 握手
在集群内任意节点上执行cluster meet命令加入新节点,握手状态会通过信息在集群内传播,这样其他节点会自动发现新节点并发起握手流程。
echo 'cluster meet 192.168.109.132 6383' | /usr/local/bin/redis-cli -p 6384 -a "密码"
2.2.2 配置主从关系
#在该从节上执行cluster replicate 主节点id
[root@centos7-mod ~]# echo 'cluster replicate 2cb35944b4492748a8c739fab63a0e90a56e414a' | /usr/local/bin/redis-cli -p 6383 -a "密码"
Warning: Using a password with '-a' option on the command line interface may not be safe.
OK
2.2.3 检查集群状态
[root@centos7-mod ~]# echo 'cluster nodes' | /usr/local/bin/redis-cli -p 6384 -a "密码"
Warning: Using a password with '-a' option on the command line interface may not be safe.
38287a7e715c358b5537a369646e9698a7583459 192.168.109.132:6383@16383 slave 2cb35944b4492748a8c739fab63a0e90a56e414a 0 1615233239757 8 connected
2cb35944b4492748a8c739fab63a0e90a56e414a 192.168.109.133:6383@16383 master - 0 1615233239000 8 connected 0-5461
0e8fa73711c300f2da6f92df49b215d270021b14 192.168.109.134:6383@16383 slave 00dd345835790608dc062dd4742d853138f06b97 0 1615233241763 9 connected
83780e418e90de6067a196d88a54fd2cfe719f86 192.168.109.134:6384@16384 slave 6a50a9c515acf3490e5a5256250d857d0812bc6a 0 1615233240760 17 connected
00dd345835790608dc062dd4742d853138f06b97 192.168.109.133:6384@16384 master - 0 1615233241000 9 connected 5462-10923
6a50a9c515acf3490e5a5256250d857d0812bc6a 192.168.109.132:6384@16384 myself,master - 0 1615233242000 17 connected 10924-16383
--本篇文章主要参考自https://blog.csdn.net/wojiuguowei/article/details/83511023