废话不多说,本博文纯属于个人笔记,可能会出现杂乱无章的感觉,只是把遇到的问题一一的记录下来,方便日后查看,也能帮助遇到类型问题的还在纠结的人。


系统版本及信息

cat /etc/redhat-release CentOS release 6.2 (Final)uname -aLinux  2.6.32-220.el6.x86_64  x86_64 x86_64 x86_64 GNU/Linuxifconfig |sed -n 1,2peth0      Link encap:Ethernet  HWaddr 40:F2:E9:29:5F:EA            inet addr:192.168.0.2  Bcast:192.168.69.255  Mask:255.255.255.0          关闭 Iptables  selinux

 软件版本信息

LAMP/LNMP 忽略,任何一个环境都可以,我这里是yum 安装的LNMP环境nagios-4.0.5.tar.gznagios-plugins-1.4.16.tar.gznrpe-2.15.tar.gzpnp4nagios-0.6.19.tar.gz

安装Nagios软件准备工作

确保 yum 能正常使用,建议是配置网络 yum ,安装系统所需库文件yum groupinstall "Compatibility libraries" "Base" "Development tools"安装lamp及所需包yum -y install http* php* mysql* perl* net-snmp* openssl* glibc rrdtoolrrdtool-devel rrdtool-perl rrdtool-phpchkconfig mysqld   on                     chkconfig httpd    onchkconfig snmpd   onservice httpd  startservice mysqld startservice snmpd start测试ok 继续下一步ps -ef | grep -v grep | grep http mysql snmp #分别查看,web页面访问测试

安装Nagios

1、创建nagios程序用户、组[root@nagios ~]# useradd -s /sbin/nologin nagios[root@nagios ~]# mkdir /usr/local/nagios[root@nagios ~]# chown -R nagios.nagios /usr/local/nagios/2、编译安装nagios[root@nagios tools]# tar zxf nagios-4.0.5.tar.gz[root@nagios tools]# cd nagios-4.0.5[root@nagios nagios-4.0.5]# ./configure --prefix=/usr/local/nagios[root@nagios nagios-4.0.5]# make all &&make install && make install-init && make install-commandmode&& make install-config && make install-webconf[root@nagios nagios-4.0.5]# echo $?03、加入开机启动chkconfig --add nagioschkconfig nagios onchkconfig--list nagios

安装nagios-plugins 插件

[root@nagios tools]# tar zxf nagios-plugins-1.4.16.tar.gz[root@nagios tools]# cd nagios-plugins-1.4.16[root@nagios tools nagios-plugins-1.4.16]# ./configure --prefix=/usr/local/nagios/[root@nagios tools nagios-plugins-1.4.16]# make[root@nagios tools nagios-plugins-1.4.16]# make install[root@nagios tools nagios-plugins-1.4.16]# echo $?0

编辑http.conf配置文件

cd /etc/httpd/confcp -a httpd.conf httpd.conf.bakvim httpd.conf # 添加在最后面即可####### setting for nagios #######ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
AuthType BasicOptions ExecCGIAllowOverride NoneOrder allow,denyAllow from allAuthName "nagios access"AuthUserFile /usr/local/nagios/etc/htpasswdRequire valid-userAlias /nagios "/usr/local/nagios/share"
AuthType BasicOptions ExecCGIAllowOverride NoneOrder allow,denyAllow from allAuthName "nagios access"AuthUserFile /usr/local/nagios/etc/htpasswdRequire valid-user修改DirectoryIndex index.html index.html.var为DirectoryIndex index.php index.html index.html.var修改Options Indexes FollowSymLinks为Options FollowSymLinks    #防止网站列目录service httpd restart 增加nagios登陆认证文件,一定要用默认的nagiosadmin作为用户,否则需要修改其他文件,修改之前备份,这里就不备份了[root@nagios etc]# cd /usr/local/nagios/etc[root@nagios etc]# sed -i s@nagiosadmin@nagiosadmin\,admin@g cgi.cfg[root@nagios etc]# sed -i s@\#default_user_name=guest@default_user_name=admin@g cgi.cfg[root@nagios nagios]# htpasswd -c /usr/local/nagios/etc/htpasswd adminNew password: ******Re-type new password:******

安装 Nrpe 插件

[root@nagios tools]# tar zxf nrpe-2.15.tar.gz[root@nagios tools]# cd nrpe-2.15[root@nagios nrpe-2.15]# ./configure;make all;make install-plugin;make install-daemon;make install-daemon-config启动Nrpe[root@nagios nrpe-2.15]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d[root@nagios nrpe-2.15]# netstat -antl |grep 5666tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN  [root@nagios libexec]#/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1NRPE v2.15关闭Nrpe[root@nagios libexec]# ps -ef | grep -v grep | grep nrpe[root@nagios libexec]# kill -9 进程号

检测nagios

[root@nagios etc]# /usr/local/nagios/bin/nagios-v /usr/local/nagios/etc/nagios.cfgTotal Warnings: 0Total Errors:   0 表示OK

启动nagios

[root@nagios etc]# service nagios start  stop  restart 开启  停止  重启http://IP/nagios

安装 pnp4nagios 插件

[root@nagios tools]# tar zxf pnp4nagios-0.6.19.tar.gz[root@nagios tools]# cd pnp4nagios-0.6.19[root@nagios tools pnp4nagios-0.6.19]#./configuremake allmake installmake install-configmake install-initmake install-webconf创建默认配置文件cd /usr/local/pnp4nagios/etccp misccommands.cfg-sample misccommands.cfgcp nagios.cfg-sample nagios.cfgcp rra.cfg-sample rra.cfgcd pagescp web_traffic.cfg-sample web_traffic.cfgcd ../check_commands/cp check_all_local_disks.cfg-samplecheck_all_local_disks.cfgcp check_nrpe.cfg-sample check_nrpe.cfgcp check_nwstat.cfg-sample check_nwstat.cfgcp /usr/local/pnp4nagios/libexec/* /usr/local/nagios/libexec/vim /usr/local/nagios/etc/nagios.cfg检查enable_environment_macros=1process_performance_data=1          host_perfdata_command=process-host-perfdataservice_perfdata_command=process-service-perfdata提示:如果nagios版本是4.X,上面配置会导致后面,生成不了流量图,报如下错误PNP4Nagios Version 0.6.19Please check the documentation for information about the following error.perfdata directory "/usr/local/pnp4nagios/var/perfdata/localhost" for host "localhost" does not exist.Read FAQ onlinefile [line]:application/models/data.php [148]:back

出现这个错误的原因是参照
解决方案是使用 Bulk Mode方式vim /usr/local/nagios/etc/nagios.cfg检查enable_environment_macros=1process_performance_data=1 添加到最后即可# service performance dataservice_perfdata_file=/usr/local/pnp4nagios/var/service-perfdataservice_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$service_perfdata_file_mode=aservice_perfdata_file_processing_interval=15service_perfdata_file_processing_command=process-service-perfdata-file# host performance data starting with Nagios 3.0host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdatahost_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$host_perfdata_file_mode=ahost_perfdata_file_processing_interval=15host_perfdata_file_processing_command=process-host-perfdata-file保存vim  /usr/local/nagios/etc/objects/commands.cfgdefine command{       command_name    check_nrpe       command_line    $USER1$/check_nrpe-H $HOSTADDRESS$ -c $ARG1$} #这一段放在上面即可如下:同步模式设定方法添加到末尾就可以,记住在这个配置文件里面, 默认有这个配置,需要找到注释掉,然后将下面的配置添加,如果不注释掉,在你检查nagios的配置文件的时候会报错define command{       command_name    process-service-perfdata-file       command_line    /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata}define command{       command_name    process-host-perfdata-file       command_line    /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/host-perfdata}定义pnp的主机和服务两个模版添加在最后面 vim /usr/local/nagios/etc/objects/templates.cfgdefine host {  name       host-pnp  action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_  register  0}define service {  name       service-pnp  action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$  register  0}也可以添加在,其他参数下面省略了,下面这个方法可以减少很多配置主机启用pnp时的时间vim /usr/local/nagios/etc/objects/templates.cfgdefine host {        action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_}define service {               action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$  }

先做一下pnp4nagios环境测试添加在httpd.conf最后面

vim /etc/httpd/conf/httpd.confAlias /pnp4nagios "/usr/local/pnp4nagios/share"
AllowOverride NoneOrder allow,denyAllow from allAuthName "Nagios Access"AuthType BasicAuthUserFile /usr/local/nagios/etc/htpasswdRequire valid-user 
RewirteEngine OnOptions FollowSymLinksRewirteBase /pnp4nagios RewirteRule ^(application|modules|system) -[F,L]RewirteCond %{REQUEST_FILENAME} !-fRewirteCond %{REQUEST_FILENAME} !-dRewirteRule .* index.php/$0 [PT,L]service httpd restart

访问

cd /usr/local/pnp4nagios/share/

mv install.php install.php.bak

编辑nagios.cfg文件

vim /usr/local/nagios/etc/nagios.cfgcfg_file=/usr/local/nagios/etc/objects/commands.cfgcfg_file=/usr/local/nagios/etc/objects/contacts.cfgcfg_file=/usr/local/nagios/etc/objects/timeperiods.cfgcfg_file=/usr/local/nagios/etc/objects/templates.cfgcfg_file=/usr/local/nagios/etc/objects/localhost.cfgcfg_file=/usr/local/nagios/etc/objects/hosts.cfgcfg_file=/usr/local/nagios/etc/objects/hostgroup.cfgcfg_file=/usr/local/nagios/etc/objects/services.cfg或者cfg_file=/usr/local/nagios/etc/objects/commands.cfgcfg_file=/usr/local/nagios/etc/objects/contacts.cfgcfg_file=/usr/local/nagios/etc/objects/timeperiods.cfgcfg_file=/usr/local/nagios/etc/objects/templates.cfgcfg_file=/usr/local/nagios/etc/objects/localhost.cfgcfg_dir=/usr/local/nagios/etc/objects/apps提示:此操作只是启用了linux主机监控,没有启用windows和switch,如果需要把注释去掉即可,第一种和第二种都可以区别是:第一种共同使用一个配置文件,第二种独立使用配置文件,这里我都会演示,下面以第一种和第二种进行区分

添加主机配置,第一种方法

默认nagios/etc/objects/ 下面没有 service.cfg host.cfg hostgroup.cfg 这几个配置文件,需要手动添加vim hosts.cfg define host{         use                      linux-server,host-pnp #这个是根据templates.cfg信息定义,如果上面定义的模板host-pnp添加在define host和define sevice里面,这儿host-pnp可以不用加,因为linux-server已经包含了         host_name                cacti                 #必须是 被监控的主机名         alias                    cacti-web             #别名随便定义         address                  192.168.0.3           #主机ip地址         contact_groups           admins                #邮件组,下面会演示}define host{         use                      linux-server,host-pnp          host_name                nginx                          alias                    nginx-web                      address                  192.168.0.4                    contact_groups           admins                }有多少机器就这样添加多少台vim hostgroup.cfgdefine hostgroup{         hostgroup_name      servers                   #组名         alias               servers_group             #别名         members             cacti,nginx               #主机名 多个 逗号 隔开}vim service.cfg          #所有主机在同一配置文件,很乱#### set cacti host  define service{         use                  local-service,services-pnp         host_name            cacti         service_description  http         check_command        check_http         contact_groups       admins         flap_detection_enabled          0} define service{         use                   local-service,services-pnp         host_name             cacti         service_description   SSH_port         check_command         check_tcp!22          contact_groups        admins         flap_detection_enabled          0}define service{       use                      local-service,services-pnp       host_name                cacti       service_description      check_/       check_command            check_nrpe!check_/  #使用nrpe检测,客户端需要定义       contact_groups           admins       flap_detection_enabled   0}    #### set nginx hostdefine service{        use                      local-service,service-pnp        host_name                nginx        service_description      Check_free_mem        check_command            check_nrpe!check_free_mem        contact_groups           admins        flap_detection_enabled   0} define service{       use                      local-service,services-pnp       host_name                nginx       service_description      check_/       check_command            check_nrpe!check_/  #使用nrpe检测,客户端需要定义       contact_groups           admins       flap_detection_enabled   0            }有多少就需要添加多少,第一种方法 end

添加主机配置,第二种方法

cd nagios/etc/objects/mkdir appcd appvim 192.168.0.2.cfg  #在一个独立的文件定义所有监控对象,这个没有定义组,意义不大###定义hostdefine host{         use                      linux-server,host-pnp #这个是根据templates.cfg信息定义,如果上面定义的模板host-pnp添加在define host和define sevice里面,这儿host-pnp可以不用加,因为linux-server已经包含了         host_name                nginx                 #必须是 被监控的主机名         alias                    nginx-web             #别名随便定义         address                  192.168.0.4           #主机ip地址         contact_groups           admins                #邮件组,下面会演示}###定义servicedefine service{        use                      local-service,service-pnp        host_name                nginx        service_description      Check_free_mem        check_command            check_nrpe!check_free_mem        contact_groups           admins        flap_detection_enabled   0} define service{       use                      local-service,services-pnp       host_name                nginx       service_description      check_/       check_command            check_nrpe!check_/  #使用nrpe检测,客户端需要定义       contact_groups           admins       flap_detection_enabled   0            }
vim 192.168.0.3.cfg###定义hostdefine host{         use                      linux-server,host-pnp #这个是根据templates.cfg信息定义,如果上面定义的模板host-pnp添加在define host和define sevice里面,这儿host-pnp可以不用加,因为linux-server已经包含了         host_name                cacti                 #必须是 被监控的主机名         alias                    cacti-web             #别名随便定义         address                  192.168.0.3           #主机ip地址         contact_groups           admins                #邮件组,下面会演示}###定义servicedefine service{        use                      local-service,service-pnp        host_name                cacti        service_description      Check_free_mem        check_command            check_nrpe!check_free_mem        contact_groups           admins        flap_detection_enabled   0}define service{        use                      local-service,service-pnp        host_name                cacti        service_description      Check_free_mem        check_command            check_nrpe!check_free_mem        contact_groups           admins        flap_detection_enabled   0}这种办法比第一种方便许多,添加主机2种方法 END

nagios邮件报警设置

[root@nagios objects]# vim contacts.cfg  #参数详解,请百度define contact{       contact_name                   nagiosadmin           use                            generic-contact              alias                          Nagios Admin                                              service_notification_period        24x7       host_notification_period           24x7       service_notification_options       w,u,c,r       host_notification_options          d,u,r       service_notification_commands      notify-service-by-email               host_notification_commands         notify-host-by-email       email                              xxxx@163.com         }define contactgroup{       contactgroup_name          admins  #这个就是上面那个admins       alias                      Nagios Administrators       members                    nagiosadmin       }

检查配置文件是否有错

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfgTotal Warnings: 0Total Errors:   0Things look okay - No serious problems were detected during the pre-flight checkservice nagios restart服务端配置 end

 客户端安装配置

需要安装net-snmp,如果有其他错误根据提示进行解决yum -y install net-snmp*1、创建nagios程序用户、组[root@nagios ~]# useradd -s /sbin/nologin nagios[root@nagios ~]# mkdir /usr/local/nagios[root@nagios ~]# chown -R nagios.nagios /usr/local/nagios/2、安装nagios-plugins 插件[root@nagios tools]# tar zxf nagios-plugins-1.4.16.tar.gz[root@nagios tools]# cd nagios-plugins-1.4.16[root@nagios tools nagios-plugins-1.4.16]# ./configure --prefix=/usr/local/nagios/[root@nagios tools nagios-plugins-1.4.16]# make[root@nagios tools nagios-plugins-1.4.16]# make install[root@nagios tools nagios-plugins-1.4.16]# echo $?03、安装 Nrpe 插件[root@nagios tools]# tar zxf nrpe-2.15.tar.gz[root@nagios tools]# cd nrpe-2.15[root@nagios nrpe-2.15]# ./configure;make all;make install-plugin;make install-daemon;make install-daemon-config编辑nrpe.cfgsed -I 's/allowed_hosts=127.0.0.1/allowed_hosts=127.0.0.1,192.168.0.2/g' /usr/local/nagios/etc/nrpe.cfgvim /usr/local/nagios/etc/nrpe.cfgcommand[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%command[check_data]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /datacommand[check_/]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Zcommand[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 保存echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg-d" >> /etc/rc.local启动Nrpe[root@nagios nrpe-2.15]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d[root@nagios nrpe-2.15]# netstat -antl |grep 5666tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN  这个在服务端操作,确保ok,如果不能请检查客户端防火墙和网络是否允许通信[root@nagios libexec]#/usr/local/nagios/libexec/check_nrpe -H 192.168.0.3NRPE v2.15关闭Nrpe[root@nagios libexec]# ps -ef | grep -v grep | grep nrpe[root@nagios libexec]# kill -9 进程号

pnp不出图时候,查看日志

vim /usr/local/pnp4nagios/etc/process_perfdata.cfg

修改

LOG_LEVEL = 0

LOG_LEVEL = 2

more /usr/local/pnp4nagios/var/perfdata.log

提示:nagios 监控进程时候,即便pnp配置ok,也不会出图,例如下面的

OK 10-20-2014 16:44:45 83d 1h 9m 16s 1/3 PROCS OK: 503 processes 
OK 10-20-2014 16:46:00 83d 1h 7m 58s 1/3 PROCS OK: 0 processes with STATE = Z

PNP4Nagios Version 0.6.19

Please check the documentation for information about the following error.

XML file "/usr/local/pnp4nagios/var/perfdata/app-11/Total_Processes.xml" not found. 

file [line]:

application/models/data.php [312]:

back

至于原因可以参考,非常详细

Nagios如果系统监控插件满足不了需求,可以自行开发插件

例如下面是一个内存监控插件,插件是百度找的还是不错的,我这里借用一下

vim /usr/local/nagios/libexec/check_mem#!/bin/bashSTAT_OK=0STAT_WARNING=1STAT_CRITICAL=2STAT_UNKNOWN=3total_mem=`free -m |awk 'NR==2{print $2}'`used_mem=`free -m |awk 'NR==3{print $3}'` #取的是系统真正用掉的内存free_mem=`free -m |awk 'NR==3{print $4}'` #取的是free+cache的内存use_per=`echo "scale=2;$used_mem/$total_mem"|bc|sed 's/^.//g'`help() {        echo "USAGE:`basename $0` [-w] 
 [-c] 
 [-h]"        exit -1}while getopts ":w:c:h" optdo        case $opt in                w)      warning=$OPTARG                        ;;                c)      critical=$OPTARG                        ;;                h)      help                        ;;                ?)      unkown=$OPTARG                        echo "error,plase check for help,USAGE:./`basename $0` -h"                        exit $STAT_UNKNOWN                        ;;        esacdoneif [[ $use_per -lt $warning ]];then        echo "OK - total:$total_mem MB,used:$used_mem MB,free:$free_mem MB | total_mem=$total_mem used_mem=$used_mem free_mem=$free_mem"        exit $STAT_OKelif [[ $use_per -ge $warning ]] && [[ $use_per -lt $critical ]];then                 echo "WARNING - total:$total_mem MB,used:$used_mem MB,free:$free_mem MB | total_mem=$total_mem used_mem=$used_mem free_mem=$free_mem"                 exit $STAT_WARNINGelse                 echo "CRITICAL - total:$total_mem MB,used:$used_mem MB,free:$free_mem MB | total_mem=$total_mem used_mem=$used_mem free_mem=$free_mem"                 exit $STAT_CRITICAL        fifi保存chown nagios.nagios check_memchmod +x check_mem./check_mem -w 80 -c 90OK - total:15926 MB,used:1839 MB,free:14086 MB | total_mem=15926 used_mem=1839 free_mem=14086vim /usr/local/nagios/etc/nrpe.cfg添加command[check_free_mem]=/usr/local/nagios/libexec/check_mem -w 80 -c 90重启nrpe在编辑/usr/local/nagios/etc/objects/app/的文件添加define service{        use                      local-service,service-pnp        host_name                cacti        service_description      Check_free_mem        check_command            check_nrpe!check_free_mem        contact_groups           admins        flap_detection_enabled   0}检查nagios 重启nagios

   Windows和交换机监控配置不难,只要思路清晰,肯定能弄出来,nagios配置其实不难,就是有点麻烦而已,只要把配置文件的关系弄明白,一切都很简单

到此全部结束