【shell】监控程序是否在挂掉

有时候程序在后台跑着跑着就挂掉了,我们可以通过grep去查找该进程,如果找不到那就说明挂掉了,也许你需要重启一下或者做一些其他的操作

grepStr="python3 test.py"
sleepTime=3

while :; do
  programs=$(ps -ef | grep "$grepStr" | grep -v "grep")
  echo "$programs"
  if [ "$programs" ]; then
    echo "is running now"
  else
    echo "stopped"
  fi
  sleep "$sleepTime"s
done

有一个场景是运行一个rpc服务,但由于某些问题rpc进程并没有挂掉,但会卡死,即发起请求没有响应了,我们可以不断发送请求,如果不返回就重启一下程序

grepStr="python3 test.py"  # grep命令过滤字符串,找出进程pid
sleepTime=30  # 暂停时间,单位秒
failedCount=0  # 累计失败次数
restartThreshold=5  # 累计失败多少次后重启
# 使用前请修改12行和26行命令

while :; do
  programs=$(ps -ef | grep "$grepStr" | grep -v "grep")
  echo "$programs"
  if [ "$programs" ]; then
    echo "is running now"
    ifResult=$(curl --connect-timeout 30 -d '{"jsonrpc": "2.0", "method": "base_getBlockCount", "params": [], "id": 1}' -H "Content-Type: application/json" -X POST 192.168.31.181:9379 | grep "result")
    echo "ifResult is:$ifResult"
    if [ "$ifResult" ]; then
      failedCount=0
      echo "failedCounts is: $failedCount"
    else
      ((failedCount++))
      echo "failedCounts++ is: $failedCount"
      if [ "$failedCount" -ge $restartThreshold ]; then
        echo "$failedCount >= $restartThreshold"
        pid=$(ps -aux | grep "$grepStr" | grep -v "grep" | awk '{print $2}')
        echo "pid: $pid"
        killCmd=$(kill -s 9 $pid)
        echo "killed the pid $pid, ready to restart."
        res=$(nohup python3 test.py >/dev/null 2>&1 &)
        failedCount=0
        echo "restart succeed, failedCount is: $failedCount"
      fi
    fi
  else
    echo "$grepStr is not running now,exit."
    break
  fi
  sleep "$sleepTime"s
done

猜你喜欢

转载自blog.csdn.net/qq_39147299/article/details/126470966