server1
1,安装软件
yum install -y pacemaker corosync
2,修改配置文件
cd /etc/corosync/
cp corosync.conf.example corosync.conf
compatibility: whitetank
totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 172.25.28.0 网段
mcastaddr: 226.94.1.128 如果局域网内有很多主机,那么需要修改加以区分
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service{
name: pacemaker 启动corosync时一起启动pacemaker
ver: 0 1表示还要进行二次启动
}
3,启动服务
/etc/init.d/corosync start
server4同上
4,校验
[root@server1 ~]# crm_verify -LV
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
这是由于enable STONITH 的原因
STONITH 是Shoot-The-Other-Node-In-The-Head(爆头)的简称,并且它能够保护数据使其不会因为节点异常或者同时访问而遭到损坏
此时crm_mon已经可以看到server1和server4 online
server1和server4使用crm交互式配置使集群校验不报错
安装pssh-2.3.1-2.1.x86_64.rpm crmsh-1.2.6-0.rc2.2.1.x86_64.rpm插件
[root@server4 ~]# crm
crm(live)# configure
showcrm(live)configure# show
node server1
node server4
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2"
crm(live)configure# property stonith-enabled=
stonith-enabled (boolean, [true]): #布尔类型
Failed nodes are STONITH'd
crm(live)configure# property stonith-enabled=false 禁止
crm(live)configure# commit 所有的配置都必须提交
没有 Fencing设备时,禁用STONITH 组件功能
在 stonith-enabled="false" 的情况下,分布式锁管理器 (DLM) 等资源以及依赖DLM 的所有服务(例如 cLVM2、GFS2 和 OCFS2)都将无法启动
此时校验不会出错
[root@server1 ~]# crm_verify -LV
(官方介绍http://www.linux-ha.org/wiki/Main_Page)
6,添加资源,只需要在一个节点上做
资源一:vip
crm(live)configure# show
node server1
node server4
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=172.25.28.100 cidr_netmask=24 op monitor interval=1min 使用tab都可以补出结果
打开监控,1min检测一次,30s也可以,vip必须无人占用,子网掩码24或36,
crm(live)configure# commit
此时两个节点中有一个获得了vip
对于双机节点来说比较特殊,如果其中有一个节点挂了,还剩一个节点,那么crm不会把资源分配给它,crm认为这是单机,这里忽略他的检查,只有一个节点时,资源会转移到此节点
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# commit
资源二:haproxy
在server1上和server4上配置好haproxy并启动
第一个haproxy:资源名,任意
第二个haproxy: 标准脚本,这个不用自己写,在/etc/init.d/haproxy,必须要保证脚本本身没问题,要先测试
7,解决资源分离
添加完两个资源之后会出现资源不在一个节点上面
Online: [ server1 server4 ]
vip (ocf::heartbeat:IPaddr2): Started server4
haproxy (lsb:haproxy): Started server1
防止资源分裂在不同的节点上,绑定成一个组
crm(live)configure# group hagroup vip haproxy
Online: [ server1 server4 ]
Resource Group: hagroup
vip (ocf::heartbeat:IPaddr2): Started server4
haproxy (lsb:haproxy): Started server4
8,资源迁移
从server4切换到server1,先使server1 处于备用状态standby,然后让节点online
[root@server4 ~]# crm node standby
Node server4: standby
Online: [ server1 ]
Resource Group: hagroup
vip (ocf::heartbeat:IPaddr2): Started server1
haproxy (lsb:haproxy): Started server1
[root@server4 ~]# crm node online
Online: [ server1 server4 ]
Resource Group: hagroup
vip (ocf::heartbeat:IPaddr2): Started server1
haproxy (lsb:haproxy): Started server1
fence自动断电功能
1,fence部署
server1,server4
安装fence-virt.x86_64
客户端安装fence(HA-LB yum源)
fence-virtd-multicast-0.3.2-5.el7.x86_64
fence-virtd-libvirt-0.3.2-5.el7.x86_64
fence-virtd-0.3.2-5.el7.x86_64
mkdir /etc/cluster
dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=128 count=1 随机字符
fence_virtd -c 配置模式
需要把virbr0改为br0
systemctl start fence_virtd.service
(还要设置为开机自启动,否则下一次开机启动会出问题)
netstat -anulp | grep fence_virtd 查看端口号,服务是否启动
server1,server4
mkdir /etc/cluster
把随即字符传给server1,server4
scp fence_xvm.key root@server1:/etc/cluster/
scp fence_xvm.key root@server4:/etc/cluster/
2,stonith中添加fence_xvm
[root@server1 ~]# stonith_admin -I
fence_xvm 虚拟机安装fence-virt.x86_64主要是用这个
fence_virt
fence_pcmk
fence_legacy
4 devices found
[root@server1 cluster]# stonith_admin -M -a fence_xvm 这个用绝对路径会出错
-M, --metadata Check the device's metadata(元数据)
-a, --agent=value The agent (eg. fence_xvm) to instantiate when calling with --register
#这里注意节点和虚拟机之间的映射
crm命令也可以非交互式直接在终端输入,但是这样不能tab不全,方便写脚本!
3,检查fence本身
fence_node server4 没有这个命令,我们使用fence_xvm代理
fence_xvm -H vm4 这个是直接使vm4断电重启
-H <domain> Virtual Machine (domain name) to fence
[root@foundation28 cluster]# virsh list 注意映射
Id Name State
----------------------------------------------------
2 vm2 running
3 vm3 running
5 vm1 running
6 vm4 running
如果vm4关闭后重启那么fence正常
( <parameter name="action">
<getopt mixed="-o"/>
<content type="string" default="reboot"/> ###stonith_admin -M -a fence_xvm的过程信息)
断电重启虚拟机之后要启动服务/etc/init.d/corosync ,pacemaker会自动一起启动
4,crm使能stonith,完善fence
crm(live)configure# property stonith-enabled=true
crm(live)configure# commit
5,测试fence
echo c > /proc/sysrq-trigger vm4内核崩溃
Online: [ server1 ]
OFFLINE: [ server4 ]
Resource Group: hagroup
vip (ocf::heartbeat:IPaddr2): Started server1
haproxy (lsb:haproxy): Started server1
vmfence (stonith:fence_xvm): Started server1 sevrer1立即接管资源
此时vm4开始重启
/etc/init.d/corosync start 重启之后一定要启动服务,否则监控服务不会上线
Online: [ server1 server4 ]
Resource Group: hagroup
vip (ocf::heartbeat:IPaddr2): Started server1
haproxy (lsb:haproxy): Started server1
vmfence (stonith:fence_xvm): Started server4