ssdb集群+keepalived搭建实战-4.keepalived配置
环境
操作系统:CentOS Linux release 7.6.1810 (Core)
ssdb:1.9.7
keepalived:keepalived-2.0.16
IP:
主: 10.80.2.121
从: 10.80.2.85
vip:10.80.2.156
安装keepalived
1.安装拓展包
根据操作系统不同,所需安装包可能有差异
[root@prodssdb-001 data]# yum install openssl-devel -y
2.下载keepalived安装包
下载地址:keepalived官网
在/opt下建目录:
[root@prodssdb-001 opt]# mkdir -p /opt/keepalive
[root@prodssdb-001 opt]# cd /opt/keepalive/
下载安装包:
[root@prodssdb-001 keepalive]# wget https://www.keepalived.org/software/keepalived-2.0.16.tar.gz
3.安装keepalived
解压:
[root@prodssdb-001 keepalive]# tar zxvf keepalived-2.0.16.tar.gz
安装:
[root@prodssdb-001 keepalive]# cd keepalived-2.0.16
[root@prodssdb-001 keepalived-2.0.16]# ./configure --prefix=/
[root@prodssdb-001 keepalived-2.0.16]# make
[root@prodssdb-001 keepalived-2.0.16]# make install
完成后,产生如下文件:
[root@prodssdb-001 keepalived-2.0.16]# ls
aclocal.m4 bin ChangeLog config.status CONTRIBUTORS doc install-sh keepalived.spec.in Makefile.am README TODO
ar-lib bin_install compile configure COPYING genhash keepalived lib Makefile.in README.md
AUTHOR build_setup config.log configure.ac depcomp INSTALL keepalived.spec Makefile missing snap
配置keepalived
分别修改主从机器keepalived配置
1.配置主库端keepalived
主库配置如下:
[root@prodssdb-001 keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
router_id HA_SSDB #路由标识,在一个局域网里面应该是唯一的
script_user root #脚本执行者
}
vrrp_script check_fence { #定义一个定时脚本
script "/etc/keepalived/check_db.sh"
interval 6 #定义脚本执行间隔,单位秒
weight 20 #定义优先级偏移量,如脚本成功,则优先级需要加上此数值
}
vrrp_instance VI_1 { #定义一个虚拟路由
state MASTER #当前节点在此虚拟路由器上的初始状态;只能有一个是MASTER,余下的都应该为BACKUP;
nopreempt #设置为非抢占模式
interface eth0 #绑定为当前虚拟路由器使用的物理接口;
virtual_router_id 99 #当前虚拟路由器的惟一标识,主备要一致,范围是0-255;
priority 100 #当前主机在此虚拟路径器中的优先级;数字越大,优先级越高,主DR必须大于备用DR,范围1-254;
advert_int 1 #通告发送间隔,包含主机优先级、心跳等。
authentication { #认证配置
auth_type PASS #认证类型,PASS表示简单字符串认证
auth_pass root密码
}
virtual_ipaddress {
10.80.2.156/24 #虚拟IP(VIP)地址,可多设,每行一个
}
track_script {
check_fence
}
}
配置定义了一个虚拟路由VI_1,设置其初始状态为MASTER,并编写定时检测脚本check_db.sh,这是我自己编写的一个防脑裂脚本。
2.配置从库端keepalived
[root@ecloud02-carchat-prod-ssdb02 keepalived]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id HA_SSDB
script_user root
}
vrrp_script check_fence {
script "/etc/keepalived/check_db.sh"
interval 6
weight 20
}
vrrp_instance VI_1 {
state BACKUP
nopreempt
interface eth0
virtual_router_id 99
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass root密码
}
virtual_ipaddress {
10.80.2.156/24
}
track_script {
check_fence
}
}
需要注意的是,我配置主库优先级(priority)为100,从库优先级(priority)为90,而检查脚本的weight为20。
这样,如果主库挂掉,虚拟ip将飘到从库:
90(从库priority) + 20(weight) > 100(主库priority)
而当主库恢复正常后,vip会自动飘回主库:
90(从库priority) + 20(weight) < 100(主库priority) + 20(weight)
3.配置防脑裂脚本
[root@ecloud02-carchat-prod-ssdb02 keepalived]# cat check_db.sh
#!/bin/bash
FENCE_LOG=/root/fence_log
GATEWAY=10.80.2.1
VIP=10.80.2.156
INTKEY='eth0'
NOW=`date +'%Y-%m-%d_%H:%M:%S'`
kill_resource() {
_VIP=$1
_INTKEY=$2
# /sbin/ifconfig $_VIP:$_INTKEY down
/usr/sbin/ip addr del $_VIP dev $_INTKEY
ps aux|grep ssdb-server|grep -v grep|awk '{print $2}'|xargs kill -9
/usr/bin/systemctl stop keepalived
ps aux|grep "keepalived -D"|grep -v grep|awk '{print $2}'|xargs kill -9
}
TMP_LOG=`tail -n43200 $FENCE_LOG`
echo "$TMP_LOG" > $FENCE_LOG
PING_RESULT=`ping -w 5 $GATEWAY 2>&1 | grep -E "100% packet loss|unreachable"`
SSDB_PROC=`ps aux | grep -v grep | grep ssdb-server|wc -l`
echo "$NOW" >> $FENCE_LOG
echo "ping result is: $PING_RESULT" >> $FENCE_LOG
echo "ssdb proc number is: $SSDB_PROC" >> $FENCE_LOG
if [ -n "$PING_RESULT" -o "$SSDB_PROC" -eq 0 ]; then
echo "I have no network connection or ssdb is down!I have to kill VIP and ssdb" >> $FENCE_LOG
kill_resource $VIP $INTKEY
exit 1
else
exit 0
fi
脚本将判断机器网络连通状态及ssdb服务状态,一旦发现机器5秒无法ping到网关,或者找不到ssdb-server服务,系统将启动kill模式,关闭本机的vip,并kill本机的ssdb相关进程,以达到防止脑裂的效果。
验证
开启主从端keepalived:
[root@ecloud02-carchat-prod-ssdb01 keepalived-2.0.16]# systemctl start keepalived
1.网络状态
在主库端查看ip信息:
[root@ecloud02-carchat-prod-ssdb01 keepalived-2.0.16]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:eb:c2:06 brd ff:ff:ff:ff:ff:ff
inet 10.80.2.121/24 brd 10.80.2.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.80.2.156/24 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:feeb:c206/64 scope link
valid_lft forever preferred_lft forever
在从库端查看ip信息:
[root@ecloud02-carchat-prod-ssdb02 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:4e:78:6b brd ff:ff:ff:ff:ff:ff
inet 10.80.2.85/24 brd 10.80.2.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe4e:786b/64 scope link
valid_lft forever preferred_lft forever
可看到虚拟ip挂在主库端。
2.模拟主库ssdb服务宕机
在主库节点手动kill掉所有ssdb服务,模拟宕机:
ps aux|grep ssdb-server|grep -v grep|awk '{print $2}'|xargs kill -9
将观察到主库ssdb,keepalived全关闭了,虚拟ip飘到从库端
3.模拟主库ssdb网络中断
手动关闭主库端网络,使其无法ping到网关:
将观察到主库ssdb,keepalived全关闭了,虚拟ip飘到从库端