有时候程序在后台跑着跑着就挂掉了,我们可以通过grep去查找该进程,如果找不到那就说明挂掉了,也许你需要重启一下或者做一些其他的操作
grepStr="python3 test.py"
sleepTime=3
while :; do
programs=$(ps -ef | grep "$grepStr" | grep -v "grep")
echo "$programs"
if [ "$programs" ]; then
echo "is running now"
else
echo "stopped"
fi
sleep "$sleepTime"s
done
有一个场景是运行一个rpc服务,但由于某些问题rpc进程并没有挂掉,但会卡死,即发起请求没有响应了,我们可以不断发送请求,如果不返回就重启一下程序
grepStr="python3 test.py" # grep命令过滤字符串,找出进程pid
sleepTime=30 # 暂停时间,单位秒
failedCount=0 # 累计失败次数
restartThreshold=5 # 累计失败多少次后重启
# 使用前请修改12行和26行命令
while :; do
programs=$(ps -ef | grep "$grepStr" | grep -v "grep")
echo "$programs"
if [ "$programs" ]; then
echo "is running now"
ifResult=$(curl --connect-timeout 30 -d '{"jsonrpc": "2.0", "method": "base_getBlockCount", "params": [], "id": 1}' -H "Content-Type: application/json" -X POST 192.168.31.181:9379 | grep "result")
echo "ifResult is:$ifResult"
if [ "$ifResult" ]; then
failedCount=0
echo "failedCounts is: $failedCount"
else
((failedCount++))
echo "failedCounts++ is: $failedCount"
if [ "$failedCount" -ge $restartThreshold ]; then
echo "$failedCount >= $restartThreshold"
pid=$(ps -aux | grep "$grepStr" | grep -v "grep" | awk '{print $2}')
echo "pid: $pid"
killCmd=$(kill -s 9 $pid)
echo "killed the pid $pid, ready to restart."
res=$(nohup python3 test.py >/dev/null 2>&1 &)
failedCount=0
echo "restart succeed, failedCount is: $failedCount"
fi
fi
else
echo "$grepStr is not running now,exit."
break
fi
sleep "$sleepTime"s
done