搭建高可用的mesos时对原来的项目还是有很大规模的修改的,同时还修改了一些以前遗留的bug
简介
机器环境
[all]
192.168.50.4
192.168.50.5
192.168.50.6
192.168.50.7
[master]
192.168.50.4
192.168.50.5
192.168.50.6
[slave]
192.168.50.4
192.168.50.5
192.168.50.6
192.168.50.7
zookeepr+mesos-master+marathon均部署在master中,mesos-slave+docker均部署在slave中
具体搭建可以看https://github.com/ncuwaln/mesos-learn
接下来说一下搭建中的问题
问题
- zookeepr myid
问题描述:zookeepr需要动态创建每一个zookeepr的myid文件以及其中的内容,接下来就是如何用ansible从zoo.cfg中抽取出当前的主机的id
# 创建myid file
- name: Make id file
file: path={{remote_dir}}/zookeeper/data/myid state=touch
# 获得本机IP,获得的IP用于从zoo.cfg中匹配id
# ps1: grep eth1是我的网卡,你的编号可能不同
# ps2: cut命令 -d选项是分隔符,-f是分割后的第几个区间的字符串
- name: get ip
shell: ip addr|grep eth1|grep inet|awk '{print $2}'| cut -d / -f 1
register: local_ip
# 根据ip匹配id
# ps: 重点还是cut命令的巧用
- name: get id
shell: "grep {{local_ip['stdout']}} {{remote_dir}}/zookeeper/conf/zoo.cfg|cut -d \\= -f 1|cut -d \\. -f 2"
register: myid
# debug用可注释
- name: echo
debug: msg={{myid}}
# 将id写入
- name: write id
lineinfile: path={{remote_dir}}/zookeeper/data/myid line={{myid['stdout']}}
- Mesos-master: Shutdown failed on fd=xx: Transport endpoint is not connected [107]
问题描述: Mesos-master: Shutdown failed on fd=xx: Transport endpoint is not connected [107]
启用mesos的advertise_ip选项
- marathon只有leader节点的服务才可访问
问题描述: 只有leader的marathon服务的8080端口才可访问,其它机器的8080端口均503
启动marathon时添加hostname选项,非leader节点的服务才可以重定向到leader节点
mesos与marathon启动脚本
为了启动以及停止mesos与marathon方便,我编写了它们两个的启动脚本,仓库zookeeper的启动脚本
mesos.sh
#!/usr/bin/env bash
MESOSBINDIR="$( cd "$( dirname "$0" )" && pwd )"
MASTER_WORK_DIR="/data/mesos/master"
MASTER_LOG_DIR="/data/mesos/master/log"
SLAVE_WORK_DIR="/data/mesos/slave"
SLAVE_LOG_DIR="/data/mesos/slave/log"
USAGE=" hostname and advertise_ip quorum zk is reuired \n
--hostname <hostname> \n
--advertise_ip <advertise_ip> \n
--quorum \n
--zk"
hostname=""
advertise_ip=""
quorum=""
zk=""
master=""
case "$1" in
start_master )
while [[ -n "$2" ]]; do
case "$2" in
--hostname ) hostname=$3; shift 2;;
--advertise_ip ) advertise_ip=$3; shift 2;;
--quorum ) quorum=$3; shift 2;;
--zk ) zk=$3; shift 2;;
* ) break;;
esac
done
if [ "$advertise_ip" = "" -o "$hostname" = "" -o "$quorum" = "" -o "$zk" = "" ]; then
echo "error options"
exit -1
fi
echo -n "Staring mesos-master ..."
nohup "${MESOSBINDIR}/mesos-master" "--hostname=$hostname" "--advertise_ip=$advertise_ip" \
"--quorum=$quorum" "--work_dir=$MASTER_WORK_DIR" "--zk=$zk" "--log_dir=$MASTER_LOG_DIR" &
echo "started"
;;
stop_master )
pid=`ps -ef|grep mesos-master|grep -v "grep"|awk '{print $2}'`
if [ "$pid" = "" ]; then
echo "No mesos master server started"
exit 0
fi
kill -9 $pid
echo "Mesos master server stoped"
;;
restart_master )
shift
"$0" stop_master ${@}
sleep 5
"$0" start_master ${@}
;;
start_slave )
while [[ -n "$2" ]]; do
case "$2" in
--hostname ) hostname=$3; shift 2;;
--advertise_ip ) advertise_ip=$3; shift 2;;
--master ) master=$3; shift 2;;
* ) break;;
esac
done
if [ "$advertise_ip" = "" -o "$hostname" = "" -o "$master" = "" ]; then
echo -n "error options"
exit -1
fi
echo "Starting mesos slave server ..."
nohup "${MESOSBINDIR}/mesos-agent" "--hostname=$hostname" "--advertise_ip=$advertise_ip" \
"--work_dir=$SLAVE_WORK_DIR" "--master=$master" "--log_dir=$SLAVE_WORK_DIR" &
echo "started"
;;
stop_slave )
pid=`ps -ef|grep mesos-agent|grep -v "grep"|awk '{print $2}'`
if [ "$pid" = "" ]; then
echo "No mesos slave server started"
exit 0
fi
kill -9 $pid
echo "Mesos slave server stoped"
;;
restart_slave )
shift
"$0" stop_slave ${@}
sleep 5
"$0" start_slave ${@}
;;
* )
echo -e $USAGE
;;
esac
marathon.sh
#!/usr/bin/env bash
MARATHONBINDIR="$( cd "$( dirname "$0" )" && pwd )"
USAGE=" master and zk is reuired \n
--master \n
--zk"
master=""
zk=""
libmesos_path=""
hostname=""
case "$1" in
start )
while [[ -n "$2" ]]; do
case "$2" in
--master ) master=$3; shift 2;;
--zk ) zk=$3; shift 2;;
--libmesos_path ) libmesos_path=$3; shift 2;;
--hostname ) hostname=$3; shift 2;;
* ) break;;
esac
done
if [ "$master" = "" -o "$zk" = "" -o "$hostname" = ""]; then
echo "error options"
exit -1
fi
echo -n "Staring mesos-master ..."
if [ ["$libmesos_path" = ""] ]; then
nohup "${MARATHONBINDIR}/marathon" "--master" "$master" "--zk" "$zk" "--hostname" "$hostname"&
else
export MESOS_NATIVE_JAVA_LIBRARY=${libmesos_path}
nohup "${MARATHONBINDIR}/marathon" "--master" "$master" "--zk" "$zk" "--hostname" "$hostname"&
fi
echo "started"
;;
stop )
pid=`ps -ef|grep marathon|grep -v "grep"|awk '{print $2}'`
if [ "$pid" = "" ]; then
echo "No marathon server started"
exit 0
fi
kill -9 $pid
echo "Mesos master server stoped"
;;
restart )
shift
"$0" stop ${@}
sleep 5
"$0" start ${@}
;;
esac
脚本还有一点小bug,即启动前没判断是否已存在进程,下次commit时应该会一并更改吧,接下来的文章就是在HA模式的环境下的应用部署操作了