centos 安装nagios客户端nagios-plugins、nrpe

我写的比较详细，也是用于做自己的笔记吧，网上总有一些写得过于简单了不全面，索性自己整理自己需要的东西了。

概述：nagios监控客户机需要需要用到一个插件，叫做“NRPE”，这样nagios服务就可以监控到客户机本地的信息，比如cpu，内存，磁盘
使用率等信息

1、首先添加nagios用户，用于管理nagios插件
#useradd nagios （要不要添加密码随你喜欢，但是千万别用简单密码。否则就会被攻击）

2、安装nagios-plugin插件
https://nagios-plugins.org/downloads/ (到这个网址下载想要的版本)

#tar zxf nagios-plugins-2.1.4.tar.gz
#cd nagios-plugins-2.1.4/
#./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios
#make
#make install （安装成功后可以在/usr/local/nagios看到生成三个目录include、libexec和share。）

3、修改目录权限
# chown nagios.nagios /usr/local/nagios
# chown -R nagios.nagios /usr/local/nagios/libexec

4、nrpe安装
# wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.13.tar.gz
# tar zxvf nrpe-2.13.tar.gz
# cd nrpe-2.13
# ./configure （如果你没有安装ssl，这里会报错提示，然后安装：yum install openssl-devel）

# make all

#make install-plugin

# make install-daemon （安装daemon）
#make install-daemon-config （安装配置文件nrpe.cfg）
注意：这里如果使用了3.X.X的版本的话，用这命令会报错：make: *** No rule to make target `install-daemon-config'. Stop.
改用命令：# make install-config

5、测试使用：
5.1、设置nagios服务器地址
#vim /usr/local/nagios/etc/nrpe.cfg
在allowed_hosts这行后面加nagios服务器的IP, 用“,”隔开，加了之后如下：
allowed_hosts=127.0.0.1,192.168.8.208

5.2、然后启动nrpe
#cd /usr/local/nagios/bin/
#./nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

本地验证安装结果：
#netstat -antpul | grep nrpe
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 15822/./nrpe （端口开启说明启动成功）
#/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.13 （显示版本号则安装成功）

nagios服务端测试安装结果：
#/usr/local/nagios/libexec/check_nrpe -H 10.148.16.91 (也是显示版本号)
如果报错：Connection refused or timed out （这是客户机主机不可链接，开放端口即可）

备注：

1、vim /usr/local/nagios/etc/nrpe.cfg文件部分解释：

command_timeout=60

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 超过5个用户报warning，10个critical
command[check_load]=/usr/local/nagios/libexec/check_load -w 8,6,6 -c 12,10,10
（这是linux系统中的该值load average: 1.05, 1.09, 1.06，根据cpu设置合适的值，我的cpu16核，我设置成上述值）
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/centos-home
（这里我只监控我系统中使用量的盘，分区剩余空间为总大小的20%警告，10%危急，-p后是分区）
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z （有5个僵尸进程报警告，10个报危急）

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 400 -c 500 （总进程到400个警告，500个报危急）

自定义脚本的监控对象：

command[enheng]=/usr/local/nagios/libexec/suibian.sh
command[aha]=python /opt/suibian.py

常用的监控对象与阀值：

监控对象		监控阀值
主机资源	主机存活： check_ping	-w 3000.0,80% -c 5000.0,100% -p 5(3000毫秒响应时间内，丢包率超过80%报警告，5000毫秒响应时间内，丢包率超过 100%报危急，一共发送5个包）
	登录用户： check_user	-w 5 -c 10(w为警告，c为危急)
	系统负载： check_load	-w 15,10,5 -c 30,25,20(1分钟，5分钟，15分钟大于对应的等待进程数则警告或危急)
	磁盘占用率： check_disk	-w 20% -c 10% -p /（根分区剩余空间为总大小的20%警告， 10%危急，-p后是根分区）
	脚本检测磁盘I/O： check_iostat	-w 5 –c 10 (磁盘I/O的iowait超过5%报警告,超过10%报危急)
	检测僵尸进程： check_zombie _procs	-w 5 -c 10 -s Z（有5个僵尸进程报警告，10个报危急）
	检测总进程数： check_total_procs	-w 150 -c 200（总进程到150个警告，200个报危急）
	脚本检测内存剩余： check_mem	-w 90% -c 95%(内存空闲率90%以上报警告，95%以上报危急)
	检测交换分区使用率： check_swap	-w 20% -c 10%（交换分区剩余空间为总大小的20%警告， 10%危急）
应用服务监控	监控服务端口： check_tcp	-H localhost2 -p 80(主机与对应的端口号)
	监控页面响应时间： check_http	-H localhost2 -u http:\/\/localhost2/test.jsp –w 5 –c 10(检查页面，超过5s报警告，超过10s报危急)
	脚本检测IP连接数： check_ips	-w 200 –c 250(IP连接数超过200报警告，超过250报危急)
流量监控	监控server流量: Check_traffic	-V 2c -C public -H localhost2 -I 2 -w 12,30 -c 15,35 -M –b(snmp版本,用户,主机,对应网卡,警告阀值,危急阀值)

2、重启nrpe步骤：
#ps aux | grep nrpe
#kill -9 进程号
#./nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

centos 安装nagios客户端nagios-plugins、nrpe

猜你喜欢