Search Results

Search found 334 results on 14 pages for 'nagios'.

Page 1/14 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Nagios Creating lots of zombie process

    - by pradeepchhetri
    In my monitoring box, I have lots of zombie process created by nagios and they gets remove quickly also. I am using active checks to perform monitoring of my servers. I accumulated the defunct processes created using the following command: $ top -d 0.25 -b -n 20 > topout.txt This collected the output of top with 0.25s delay 20 times. I did grep on the topout.txt for the defunct process. $ cat topout.txt | grep defunct I get the following output. 8957 nagios 20 0 0 0 0 Z 6.0 0.0 0:00.02 nagios <defunct> 8951 nagios 20 0 0 0 0 Z 3.0 0.0 0:00.01 nagios <defunct> 8954 nagios 20 0 0 0 0 Z 3.0 0.0 0:00.01 nagios <defunct> 8945 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 8946 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 8980 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9000 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.00 nagios <defunct> 9024 nagios 20 0 0 0 0 Z 7.0 0.0 0:00.02 nagios <defunct> 9025 nagios 20 0 0 0 0 Z 3.5 0.0 0:00.01 nagios <defunct> 9040 nagios 20 0 0 0 0 Z 3.1 0.0 0:00.01 nagios <defunct> 9086 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9087 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9123 nagios 20 0 0 0 0 Z 6.1 0.0 0:00.02 nagios <defunct> 9126 nagios 20 0 0 0 0 Z 3.0 0.0 0:00.01 nagios <defunct> 9131 nagios 20 0 0 0 0 Z 3.0 0.0 0:00.01 nagios <defunct> 9091 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.05 nagios <defunct> 9111 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9119 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9118 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9151 nagios 20 0 0 0 0 Z 2.9 0.0 0:00.02 nagios <defunct> 9153 nagios 20 0 0 0 0 Z 2.9 0.0 0:00.02 nagios <defunct> 9150 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9164 nagios 20 0 0 0 0 Z 3.5 0.0 0:00.02 nagios <defunct> 9171 nagios 20 0 0 0 0 Z 3.5 0.0 0:00.02 nagios <defunct> 9154 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9156 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9163 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9167 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9178 nagios 20 0 0 0 0 Z 3.8 0.0 0:00.02 nagios <defunct> 9174 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9179 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> 9182 nagios 20 0 0 0 0 Z 0.0 0.0 0:00.01 nagios <defunct> Can somebody help me in finding out the reason of these zombie processes and how i can prevent these zombie processes ?

    Read the article

  • Nagios: NRPE: Unable to read output, Can't find the reason, can you?

    - by Itai Ganot
    I have a Nagios server and a monitored server. On the monitored server: [root@Monitored ~]# netstat -an |grep :5666 tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN [root@Monitored ~]# locate check_kvm /usr/lib64/nagios/plugins/check_kvm [root@Monitored ~]# /usr/lib64/nagios/plugins/check_kvm -H localhost hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running [root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm NRPE: Unable to read output [root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost NRPE v2.14 [root@Monitored ~]# ps -ef |grep nrpe nagios 21178 1 0 16:11 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d [root@Monitored ~]# On the Nagios server: [root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 -c check_kvm NRPE: Unable to read output [root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 NRPE v2.14 [root@Nagios ~]# When I check another server in the network using the same command it works: [root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.80 -c check_kvm hosts:4 OK:4 WARN:0 CRIT:0 - karmisoft:running ab2c4:running kidumim1:running travel2gether1:running [root@Nagios ~]# Running the check locally using Nagios account: [root@Monitored ~]# su - nagios -bash-4.1$ /usr/lib64/nagios/plugins/check_kvm hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running -bash-4.1$ Running the check remotely from the Nagios server using Nagios account: -bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 -c check_kvm NRPE: Unable to read output -bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 NRPE v2.14 -bash-4.1$ Running the same check_kvm against a different server in the network using Nagios account: -bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.80 -c check_kvm hosts:4 OK:4 WARN:0 CRIT:0 - karmisoft:running ab2c4:running kidumim1:running travel2gether1:running -bash-4.1$ Permissions: -rwxr-xr-x. 1 root root 4684 2013-10-14 17:14 nrpe.cfg (aka /etc/nagios/nrpe.cfg) drwxrwxr-x. 3 nagios nagios 4096 2013-10-15 03:38 plugins (aka /usr/lib64/nagios/plugins) /etc/sudoers: [root@Monitored ~]# grep -i requiretty /etc/sudoers #Defaults requiretty iptables/selinux: [root@Monitored xinetd.d]# service iptables status iptables: Firewall is not running. [root@Monitored xinetd.d]# service ip6tables status ip6tables: Firewall is not running. [root@Monitored xinetd.d]# grep disable /etc/selinux/config # disabled - No SELinux policy is loaded. SELINUX=disabled [root@Monitored xinetd.d]# The command in /etc/nagios/nrpe.cfg is: [root@Monitored ~]# grep kvm /etc/nagios/nrpe.cfg command[check_kvm]=sudo /usr/lib64/nagios/plugins/check_kvm and the nagios user is added on /etc/sudoers: nagios ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/check_kvm nagios ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/check_nrpe The check_kvm is a shell script, looks like that: #!/bin/sh LIST=$(virsh list --all | sed '1,2d' | sed '/^$/d'| awk '{print $2":"$3}') if [ ! "$LIST" ]; then EXITVAL=3 #Status 3 = UNKNOWN (orange) echo "Unknown guests" exit $EXITVAL fi OK=0 WARN=0 CRIT=0 NUM=0 for host in $(echo $LIST) do name=$(echo $host | awk -F: '{print $1}') state=$(echo $host | awk -F: '{print $2}') NUM=$(expr $NUM + 1) case "$state" in running|blocked) OK=$(expr $OK + 1) ;; paused) WARN=$(expr $WARN + 1) ;; shutdown|shut*|crashed) CRIT=$(expr $CRIT + 1) ;; *) CRIT=$(expr $CRIT + 1) ;; esac done if [ "$NUM" -eq "$OK" ]; then EXITVAL=0 #Status 0 = OK (green) fi if [ "$WARN" -gt 0 ]; then EXITVAL=1 #Status 1 = WARNING (yellow) fi if [ "$CRIT" -gt 0 ]; then EXITVAL=2 #Status 2 = CRITICAL (red) fi echo hosts:$NUM OK:$OK WARN:$WARN CRIT:$CRIT - $LIST exit $EXITVAL Edit (10/22/13): Following all that, I am now able to get some response from the script: [root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm Unknown guests [root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost NRPE v2.14 [root@Monitored ~]# /usr/lib64/nagios/plugins/check_kvm hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running [root@Monitored ~]# su - nagios -bash-4.1$ /usr/lib64/nagios/plugins/check_kvm hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running -bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm Unknown guests -bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H localhost NRPE v2.14 It seems like the problem is some how related to the check_nrpe command or something which is related to the nrpe installation on the server.

    Read the article

  • "Unable to open MRTG log file" error with nagios and mrtg

    - by Simone Magnaschi
    We have a strange issue with our setup of icinga / nagios and mrtg. Icinga is working great and has no problem, it can monitor basically everything without issues. We setup mrtg to gather bandwith data from our routers and switches. MRTG is working fine: it stores the log data in the /var/www/mrtg/ directory and displays the graph data via web. We assume so MRTG is doing great. We tried to setup bandwidth checks in nagios: define service{ use generic-service ; Inherit values from a template host_name zywall-agora service_description ZYWALL AGORA TRAFFICO check_command check_local_mrtgtraf!/var/www/mrtg/x.x.x.x_2.log!AVG!1000000,2000000!5000000,5000000!1000 check_interval 1 ; Check the service every 1 minute under normal conditions retry_interval 1 ; Re-check every minute until its final/hard state is determined } Where /var/www/mrtg/x.x.x.x_2.log is the correct log path file. We keep on getting Unable to open MRTG log file error in the test result in icinga web interface. We tried everything: give ownership to user nagios or icinga to the log file give chmod 777 to the file try to copy the file in another directory and give it full permission Same error. The strange thing is that if we use the command that nagios generate in a bash session the command works like a charm: /usr/lib64/nagios/plugins/check_mrtgtraf -F /var/www/mrtg/x.x.x.x_2.log -a AVG -w 10,20 -c 5000000,5000000 -e 10 Result: Traffic WARNING - Avg. In = 17.9 KB/s, Avg. Out = 5.0 KB/s|in=17.877930KB/s;10.000000;5000000.000000;0.000000 out=5.000000KB/s;20.000000;5000000.000000;0.000000 We ran that command line as root, as user nagios and as user icinga and all three worked ok. We thought that the command that nagios perform maybe has something wrong in it, so we debugged nagios but we found out that the generated command from nagios is the same as above. Searching on google for these kind of problem returns only issues of systems where mrtg is not installed or issues with the wrong path to the log file, but these seems not to be our case. We are stuck, can somebody help?

    Read the article

  • Monit send alert to nagios NSCA

    - by mYzk
    I want to monitor a web with monit check host function and if the site is down then alert it to nagios nsca so it sends the info to nagios and nagios marks it as status OK or host down for example. The problem is that how can I make the monit alert fuction send the info to nagios nsca. I am not sure that this will work, but what I came up with is: set alert exec 'echo -e "nagios nsca format" | /usr/local/nagios/bin/send_nsca -H serveraddress -c /usr/local/nagios/etc/send_nsca.cfg' Would this work and is it the best solution to work with or can it be done some other way?

    Read the article

  • nagios NRPE: Unable to read output

    - by user555854
    I currently set up a script to restart my http servers + php5 fpm but can't get it to work. I have googled and have found that mostly permissions are the problems of my error but can't figure it out. I start my script using /usr/lib/nagios/plugins/check_nrpe -H bart -c restart_http This is the output in my syslog on the node I want to restart Jun 27 06:29:35 bart nrpe[8926]: Connection from 192.168.133.17 port 25028 Jun 27 06:29:35 bart nrpe[8926]: Host address is in allowed_hosts Jun 27 06:29:35 bart nrpe[8926]: Handling the connection... Jun 27 06:29:35 bart nrpe[8926]: Host is asking for command 'restart_http' to be run... Jun 27 06:29:35 bart nrpe[8926]: Running command: /usr/bin/sudo /usr/lib/nagios/plugins/http-restart Jun 27 06:29:35 bart nrpe[8926]: Command completed with return code 1 and output: Jun 27 06:29:35 bart nrpe[8926]: Return Code: 1, Output: NRPE: Unable to read output Jun 27 06:29:35 bart nrpe[8926]: Connection from 192.168.133.17 closed. If I run the command myself it runs fine (but asks for a password) (nagios user) This are the script permission and the script contents. -rwxrwxrwx 1 nagios nagios 142 Jun 26 21:41 /usr/lib/nagios/plugins/http-restart #!/bin/bash echo "ok" /etc/init.d/nginx stop /etc/init.d/nginx start /etc/init.d/php5-fpm stop /etc/init.d/php5-fpm start echo "done" I also added this line to visudo nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ My local nagios nrpe.cfg ############################################################################# # Sample NRPE Config File # Written by: Ethan Galstad ([email protected]) # # # NOTES: # This is a sample configuration file for the NRPE daemon. It needs to be # located on the remote host that is running the NRPE daemon, not the host # from which the check_nrpe client is being executed. ############################################################################# # LOG FACILITY # The syslog facility that should be used for logging purposes. log_facility=daemon # PID FILE # The name of the file in which the NRPE daemon should write it's process ID # number. The file is only written if the NRPE daemon is started by the root # user and is running in standalone mode. pid_file=/var/run/nagios/nrpe.pid # PORT NUMBER # Port number we should wait for connections on. # NOTE: This must be a non-priviledged port (i.e. > 1024). # NOTE: This option is ignored if NRPE is running under either inetd or xinetd server_port=5666 # SERVER ADDRESS # Address that nrpe should bind to in case there are more than one interface # and you do not want nrpe to bind on all interfaces. # NOTE: This option is ignored if NRPE is running under either inetd or xinetd #server_address=127.0.0.1 # NRPE USER # This determines the effective user that the NRPE daemon should run as. # You can either supply a username or a UID. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd nrpe_user=nagios # NRPE GROUP # This determines the effective group that the NRPE daemon should run as. # You can either supply a group name or a GID. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd nrpe_group=nagios # ALLOWED HOST ADDRESSES # This is an optional comma-delimited list of IP address or hostnames # that are allowed to talk to the NRPE daemon. # # Note: The daemon only does rudimentary checking of the client's IP # address. I would highly recommend adding entries in your /etc/hosts.allow # file to allow only the specified host to connect to the port # you are running this daemon on. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd allowed_hosts=127.0.0.1,192.168.133.17 # COMMAND ARGUMENT PROCESSING # This option determines whether or not the NRPE daemon will allow clients # to specify arguments to commands that are executed. This option only works # if the daemon was configured with the --enable-command-args configure script # option. # # *** ENABLING THIS OPTION IS A SECURITY RISK! *** # Read the SECURITY file for information on some of the security implications # of enabling this variable. # # Values: 0=do not allow arguments, 1=allow command arguments dont_blame_nrpe=0 # COMMAND PREFIX # This option allows you to prefix all commands with a user-defined string. # A space is automatically added between the specified prefix string and the # command line from the command definition. # # *** THIS EXAMPLE MAY POSE A POTENTIAL SECURITY RISK, SO USE WITH CAUTION! *** # Usage scenario: # Execute restricted commmands using sudo. For this to work, you need to add # the nagios user to your /etc/sudoers. An example entry for alllowing # execution of the plugins from might be: # # nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ # # This lets the nagios user run all commands in that directory (and only them) # without asking for a password. If you do this, make sure you don't give # random users write access to that directory or its contents! command_prefix=/usr/bin/sudo # DEBUGGING OPTION # This option determines whether or not debugging messages are logged to the # syslog facility. # Values: 0=debugging off, 1=debugging on debug=1 # COMMAND TIMEOUT # This specifies the maximum number of seconds that the NRPE daemon will # allow plugins to finish executing before killing them off. command_timeout=60 # CONNECTION TIMEOUT # This specifies the maximum number of seconds that the NRPE daemon will # wait for a connection to be established before exiting. This is sometimes # seen where a network problem stops the SSL being established even though # all network sessions are connected. This causes the nrpe daemons to # accumulate, eating system resources. Do not set this too low. connection_timeout=300 # WEEK RANDOM SEED OPTION # This directive allows you to use SSL even if your system does not have # a /dev/random or /dev/urandom (on purpose or because the necessary patches # were not applied). The random number generator will be seeded from a file # which is either a file pointed to by the environment valiable $RANDFILE # or $HOME/.rnd. If neither exists, the pseudo random number generator will # be initialized and a warning will be issued. # Values: 0=only seed from /dev/[u]random, 1=also seed from weak randomness #allow_weak_random_seed=1 # INCLUDE CONFIG FILE # This directive allows you to include definitions from an external config file. #include=<somefile.cfg> # INCLUDE CONFIG DIRECTORY # This directive allows you to include definitions from config files (with a # .cfg extension) in one or more directories (with recursion). #include_dir=<somedirectory> #include_dir=<someotherdirectory> # COMMAND DEFINITIONS # Command definitions that this daemon will run. Definitions # are in the following format: # # command[<command_name>]=<command_line> # # When the daemon receives a request to return the results of <command_name> # it will execute the command specified by the <command_line> argument. # # Unlike Nagios, the command line cannot contain macros - it must be # typed exactly as it should be executed. # # Note: Any plugins that are used in the command lines must reside # on the machine that this daemon is running on! The examples below # assume that you have plugins installed in a /usr/local/nagios/libexec # directory. Also note that you will have to modify the definitions below # to match the argument format the plugins expect. Remember, these are # examples only! # The following examples use hardcoded command arguments... command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1 command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 # The following examples allow user-supplied arguments and can # only be used if the NRPE daemon was compiled with support for # command arguments *AND* the dont_blame_nrpe directive in this # config file is set to '1'. This poses a potential security risk, so # make sure you read the SECURITY file before doing this. #command[check_users]=/usr/lib/nagios/plugins/check_users -w $ARG1$ -c $ARG2$ #command[check_load]=/usr/lib/nagios/plugins/check_load -w $ARG1$ -c $ARG2$ #command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ #command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ command[restart_http]=/usr/lib/nagios/plugins/http-restart # # local configuration: # if you'd prefer, you can instead place directives here include=/etc/nagios/nrpe_local.cfg # # you can place your config snipplets into nrpe.d/ include_dir=/etc/nagios/nrpe.d/ My Sudoers files # /etc/sudoers # # This file MUST be edited with the 'visudo' command as root. # # See the man page for details on how to write a sudoers file. # Defaults env_reset # Host alias specification # User alias specification # Cmnd alias specification # User privilege specification root ALL=(ALL) ALL nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ # Allow members of group sudo to execute any command # (Note that later entries override this, so you might need to move # it further down) %sudo ALL=(ALL) ALL # #includedir /etc/sudoers.d Hopefully someone can help!

    Read the article

  • Error accessing or executing Nagios / Icinga binary '/usr/sbin/nagios'. Cannot run the mandatory syntax check

    - by Zim3r
    I see an error while using NConf interface in Generate Nagios config Error accessing or executing Nagios / Icinga binary '/usr/sbin/nagios'. Cannot run the mandatory syntax check. I checked Apache Error_log and it says: sh: /usr/sbin/nagios: Permission denied I tried changing permissions and ownership but no change. How can I fix this? Edit: ls -l /usr/sbin/nagios -rwxrwxrwx. 1 apache apache 644184 Jul 2 02:10 /usr/sbin/nagios ps -ef | egrep 'httpd|apache' root 4175 1 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4177 4175 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4178 4175 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4179 4175 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4180 4175 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4181 4175 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4182 4175 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4183 4175 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4184 4175 0 10:50 ? 00:00:00 /usr/sbin/httpd apache 4559 4175 0 11:31 ? 00:00:00 /usr/sbin/httpd root 4888 4854 0 12:26 pts/1 00:00:00 egrep httpd|apache

    Read the article

  • Nagios NTP, discarding peer

    - by picca
    We're using nagios *check_ntp_time* for monitoring time on our servers. Unfortunately the service is flapping. And reporting a lot of false-positives. It happens everytime for random server in random day time and lasts for ~10-30 minutes. When the problem occurs we get: watch01:~ # /usr/lib/nagios/plugins/check_ntp_time -H lb01 -w 1 -c 2 -v sending request to peer 0 response from peer 0: offset 0.07509887218 sending request to peer 0 response from peer 0: offset 0.07508444786 sending request to peer 0 response from peer 0: offset 0.07499825954 sending request to peer 0 response from peer 0: offset 0.07510817051 discarding peer 0: stratum=0 overall average offset: 0 NTP CRITICAL: Offset unknown| When everything is ok, we get (I used different server to not have to wait): watch01:~ # /usr/lib/nagios/plugins/check_ntp_time -H web02 -w 1 -c 2 -v sending request to peer 0 response from peer 0: offset 0.0002282857895 sending request to peer 0 response from peer 0: offset 0.0002194643021 sending request to peer 0 response from peer 0: offset 0.0002347230911 sending request to peer 0 response from peer 0: offset 0.0002293586731 overall average offset: 0.0002282857895 NTP OK: Offset 0.0002282857895 secs|offset=0.000228s;1.000000;2.000000; We are using: check_ntp_time v1.4.15 (nagios-plugins 1.4.15) on Debian squeeze. Remote ntp daemon is: ntpd - NTP daemon program - Ver. 4.2.4p4 I already found some forums where the problem is described: 1, 2, 3. Every time they edvise to upgrade nagios-plugins, because in version prior to 1.4.13 there was a bug with inserted leap second. But we have already newer version of nagios-plugins.

    Read the article

  • Recognizing Dell EquilLogic with Nagios

    - by user3677595
    EDIT: All firmware and models are compatible, that is why nothing is posted about it. Okay, so there will be a lot here, so please bare with me. I've been working on this now for a few hours (reading manuals and such) so I'm not just coming here right out of the blue. I am working on a PRE-EXISTING Nagios server where there are several other existing plugins and checks running and working. Now I want to add another server there to check so I made the following modifications: First and foremost, I added a file to /usr/local/nagios/libexec named: check_equallogic.sh. The permissions are 755, the same as all others. I have chowned to nagios:nagios and in the listing it shows the Owner as Nagios. I then added a command to the commands.cfg file in \usr\local\nagios\etc\objects that shows the following: # 'check_equallogic' command definition define command{ command_name check_equallogic command_line $USER1$/check_equallogic -H $HOSTADDRESS$ -C $ARG1$ -t $ARG2$ $ARG3$ } Following this, I created a file named equallogic.cfg in the objects directory and it contains (more or less): define host{ use linux-server ; Inherit default values from a template host_name 172.16.50.11 ; The name we're giving to this device alias EqualLogic ; A longer name associated with the device address 172.16.50.11 ; IP address of the device contact_groups admins } Check Equallogic Information define service{ use generic-service host_name 172.16.50.11 service_description General Information check_command check_equallogic!public!info } After ensuring that permissions are okay for all files, I restart the nagios service, no errors. When I go into the WebGUI, I get the following errors AFTER the check runs: (Return code of 127 is out of bounds - plugin may be missing) Extra, probably unrelated problem Furthermore, when I log into the EquilLogic server, under Audit logs I get the following error: Level: AUDIT Time: 26/05/2014 3:59:13 PM Member: ps4100-1 Subsystem: agent Event ID: 22.7.1 SNMP packet validation failed, request received from 172.16.10.11 An snmpwalk receives a timeout, whereas others succeed. I will work on importing the MIBs tomorrow. The reason why I am mentioning it is because I want to make sure that it is only a MIB issue for the SNMP. If it is, then ignore this area. I am entirely unsure of what to do here.

    Read the article

  • Nagios returns "No output returned from plugin" running process

    - by user56291
    I have a nagios server and a bunch of nagios clients that i currently monitor. All the clients are setup with the following nrpe configuration. check_users, check_load... metrics are successfully displayed on the nagios interface but check_nginx and check_server_proxy displayed as "Unknown"-(No output returned from plugin). As far as i understood nagios simply runs ps command and looks for either the argument strings or the name of the command to verify whether the service is running. Also with -c flag, one can give nagios a threshold to determine the output (ie: -c 1 returns 'OK' for if it finds at least 1 process.) nrpe_local.cfg: ###################################### # Do any local nrpe configuration here ###################################### allowed_hosts =127.0.0.1,10.0.2.181 command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 50% -c 25% command[check_server_proxy]=/usr/lib/nagios/plugins/check_procs -c 1 -a "api-v1/server.js" command[check_nginx]=/usr/lib/nagios/plugins/check_procs -c 1:30 -C nginx nagios_server.cfg ... define host{ use generic-host ; Name of host template to use host_name plum alias plum address 10.0.2.88 check_command check-host-alive-by-ssh } ... #Check api-proxy-server define service{ use generic-service host_name plum service_description check api proxy service check_command check_nrpe!check_server_proxy } define service { use generic-service ; Name of service template to use host_name plum service_description CHECK_NGINX check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 3 check_command check_nrpe!check_nginx notifications_enabled 1 } Also when i run the command on the nagios client: /usr/lib/nagios/plugins/check_procs -c 1 -a "api-v1/server.js" I get the desired output PROCS OK: 1 process with args 'api-v1/server.js' I would really appreciate any pointers that might help me solve why it nrpe command does not return the desired output on the nagios server panel.

    Read the article

  • Nagios NRPE “No Output returned from plugin“ error

    - by user118074
    So I've just started configuring Nagios in my environment and I'm getting the above error when trying to user the NRPE plugin. The host file is as follows: define { host_name servername alias servername address xxx.xxx.xxx.xxx use generic-host } define service { use generic-service host_name servername service_description CPU load check_command check_nrpe!alias_cpu } This is the check_nrpe.cfg file that is located in /etc/nagios-plugins/config NOTE: this command runs a program $ARG1$ with arguments $ARG2$ define command { command_name check_nrpe command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ } NOTE: this command runs a program $ARG1$ with no arguments define command { command_name check_nrpe_1arg command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ } Any ideas what is wrong or where to start to solve this?

    Read the article

  • Displaying Nagios on a 52" 1080p screen

    - by gdm
    I'm using a 52" 1080p LCD screen to monitor Nagios, positioned where most of the users can see it. Using the default Nagios web view sort of sucks, since you need to increase the text size a decent amount so it's legible from a distance, and then the "Current Network Status", "Host Status Totals", and other boxes along the top take up the majority of the screen realestate; you can't really see the list of host details. Is there a custom view for Nagios, or a plugin, or something available which is meant to display Nagios details on a large screen with large text?

    Read the article

  • Nagios core Event Handler not working

    - by sivashanmugam
    Nagios Event Handler is not triggering when the service is taking more time to response or down. My configuration in below nagios.cfg enable_event_handlers=1 localhost.cfg define service { use generic-service host_name Server service_description test-server servicegroups test-service check_command check-service is_volatile 0 check_period 24x7 max_check_attempts 4 normal_check_interval 2 retry_check_interval 2 contact_groups testcontacts notification_period 24x7 notification_options w,u,c,r notifications_enabled 1 event_handler_enabled 1 event_handler recheck-service } command.cfg define command{ command_name recheck-service command_line /usr/local/nagios/libexec/alert.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ } alert.sh file !/bin/sh set -x case "$1" in OK) # The service just came back up, so don't do anything... ;; WARNING) # We don't really care about warning states, since the service is probably still running... ;; UNKNOWN) # We don't know what might be causing an unknown error, so don't do anything... ;; CRITICAL) Aha! The HTTP service appears to have a problem - perhaps we should restart the server... Is this a "soft" or a "hard" state? case "$2" in We're in a "soft" state, meaning that Nagios is in the middle of retrying the check before it turns into a "hard" state and contacts get notified... SOFT) # What check attempt are we on? We don't want to restart the web server on the first check, because it may just be a fluke! case "$3" in Wait until the check has been tried 3 times before restarting the web server. If the check fails on the 4th time (after we restart the web server), the state type will turn to "hard" and contacts will be notified of the problem. Hopefully this will restart the web server successfully, so the 4th check will result in a "soft" recovery. If that happens no one gets notified because we fixed the problem! 3) echo -n "Going To Ping the Virtual Machine (3rd soft critical state)..." # Call the init script to restart the HTTPD server myresult=`/usr/local/nagios/libexec/check_http xyz.com -t 100 | grep 'time'| awk '{print $10}'` echo "Your Service Is taking the following time Delay" "$myresult Seconds" |mail -s "WARNING : Service Taken More Time To Response" [email protected] ;; esac ;; # The HTTP service somehow managed to turn into a hard error without getting fixed. # It should have been restarted by the code above, but for some reason it didn't. # Let's give it one last try, shall we? # Note: Contacts have already been notified of a problem with the service at this

    Read the article

  • Error with Apache, Nagios and Snorby integration

    - by user1428366
    I'm trying to use apache to serve two different websites (Nagios and Snorby). The problem is that when I try to see the "/snorby" website, apache sends me the "It works" page. If I try to access to "/nagios" it works perfectly. Snorby is running under ruby passenger .This are the config files. <VirtualHost *:80> ScriptAlias /nagios/cgi-bin "/srv/nagios/sbin" <Directory "/srv/nagios/sbin"> # SSLRequireSSL Options ExecCGI AllowOverride None Order allow,deny Allow from all # Order deny,allow # Deny from all # Allow from 127.0.0.1 AuthName "Nagios Access" AuthType Basic AuthUserFile /srv/nagios/etc/htpasswd.users Require valid-user </Directory> Alias /nagios "/srv/nagios/share" <Directory "/srv/nagios/share"> # SSLRequireSSL Options None AllowOverride None Order allow,deny Allow from all # Order deny,allow # Deny from all # Allow from 127.0.0.1 AuthName "Nagios Access" AuthType Basic AuthUserFile /srv/nagios/etc/htpasswd.users Require valid-user </Directory> </VirtualHost> And the other one is this: <VirtualHost *:80> #Alias /snorby "/var/www/snorby-2.6.0/public" # !!! Be sure to point DocumentRoot to 'public'! DocumentRoot /var/www/snorby-2.6.0/public <Directory /var/www/snorby-2.6.0/public> # This relaxes Apache security settings. AllowOverride all # MultiViews must be turned off. Options -MultiViews </Directory> </VirtualHost> If I disable the Nagios webpage, the Snorby webpage works. I think the problem is Snorby because when I try to access to the Ip address with Nagios page disable, the webapplication redirects me to http:// myserverip/dashboard. Can anyone help me please? Thank you so much! Regards

    Read the article

  • Monitoring over Time with Nagios: How?

    - by David
    Nagios in its standard usage monitors with point-in-time checks: either something is - or is not - true. Other tools like SGI's PCP, HP's MeasureWare, and SEC provide monitoring over time - monitoring things like average disk access time over the last five minutes, or other similar items. Is there anything like this for Nagios? I'm already running NDOUtils, which seems like a natural source for such data. I'd like to have something that would monitor and fire off alarms based on a time-based check using historical data. Is there anything like this for Nagios?

    Read the article

  • Nagios command not transmitting all arguments

    - by markus
    I'm using the following service to monitor our postgres db from nagios: define service{ use test-service ; Name of servi$ host_name DEMOCGN002 service_description Postgres State check_command check_nrpe!check_pgsql!192.168.1.135!test!test!test notifications_enabled 1 } On the remote machine I've configured the command: command[check_pgsql]=/usr/lib/nagios/plugins/check_pgsql -H $ARG1$ -d $ARG2$ -l $ARG3$ -p $ARG4$ In the syslog I can see that command is executed, but there is only one argument transmitted: Oct 20 13:18:43 DEMOSRV01 nrpe[1033]: Running command: /usr/lib/nagios/plugins/check_pgsql -H 192.168.1.134 -d -l -p Oct 20 13:18:43 DEMOSRV01 nrpe[1033]: Command completed with return code 3 and output: check_pgsql: Database name is not valid - -l#012Usage:#012check_pgsql [-H <host>] [-P <port>] [-c <critical time>] [-w <warning time>]#012 [-t <timeout>] [-d <database>] [-l <logname>] [-p <password>] Oct 20 13:18:43 DEMOSRV01 nrpe[1033]: Return Code: 3, Output: check_pgsql: Database name is not valid - -l#012Usage:#012check_pgsql [-H <host>] [-P <port>] [-c <critical time>] [-w <warning time>]#012 [-t <timeout>] [-d <database>] [-l <logname>] [-p <password>] Why are arguments 2,3 and 4 missing?

    Read the article

  • Nagios check_bgp_neighbors plugin showing critical status

    - by user141610
    I am trying to configure nagios check_bgp_neighbors plug-in on Ubuntu and followed README file of check_bgp_neighbors plug-in. I have made following changes: define command{ command_name check_bgp_all command_line $USER1$/check_bgp_neighbors -H $HOSTADDRESS$ -C $USER3$ -n $ARG1$ -n $ARG2$ } to define command{ command_name check_bgp_all command_line /usr/local/nagios/libexec/check_bgp_neighbors.sh -H xx.xx.xx.49 -C snmpName -n xx.xx.xx.50 And define service{ use server-service hostgroup_name svc-bgp1 service_description BGP Check 1 check_command check_bgp_all!10.0.0.1!172.16.0.2 } to define service{ use generic-service hostgroup_name svc-bgp1 service_description BGP Check 1 check_command check_bgp_all!xx.xx.xx.50 } xx.xx.xx.49 is the IP of the host router and xx.xx.xx.50 is the IP of eBGP neighbour. Status information: line: neighbor:xx.xx.xx.50:sent:78838:received:9769 Failed: status:6 prefixes:16 sent:0 received:1 Log [1353997904] SERVICE NOTIFICATION: router1;router1;BGP CHECK 2;CRITICAL;notify-service-by-email;line: neighbor:103.7.248.50:sent:78842:received:9772 [1353997904] SERVICE NOTIFICATION: router1;router1;BGP CHECK 2;CRITICAL;notify-service-by-sms;line: neighbor:103.7.248.50:sent:78842:received:9772 Why does it show critical status???? I am not getting response for this question, if you need additional information please mention it in comment.

    Read the article

  • How can Nagios handle non-threshold based plugins?

    - by FliesLikeABrick
    I am writing a Nagios plugin to monitor trends of a certain storage resource utilization (e.g. gradual increases are fine, but an instantaneous/sudden increase or decrease in resource usage may indicate a problem). For what it's worth, it is reviewing the last N entries in an RRD file generated by a custom cacti data source/templates. What is the "right" way to handle Nagios notification config/implementation for this? The problem is that it the plugin would exit as warning/critical for one polling period, but in the next it would be fine (or 3 polling periods later, if I look at 3 polling periods worth of data). I guess the question is: should I just write it in such a way that it will alert for X polling periods, or should I find a way to write it such that manual intervention is required for it to clear (such as logging into the monitoring server or hitting a URL to run a script that submits a passive result)? Your input is appreciated, and if you have any tips for how to implement the latter I'm open to them (I can think of a few ways to possibly implement it)

    Read the article

  • Nagios 403 forbidden, indexes?

    - by Georgi
    installed nagios under freebsd 9, but can't get the right way to be public in browser (from other pc's). I think that the problem is in the indexes or that there is not index file (instead main.php). Apache says that syntax is ok. The permissions of the dir are 777. The logs print Directory index forbidden by Options directive: /usr/local/www/nagios/. This is my configuration: ScriptAlias /nagios/cgi-bin/ /usr/local/www/nagios/cgi-bin/ Alias /nagios /usr/local/www/nagios/ <Directory /usr/local/www/nagios> Options +Indexes FollowSymLinks +ExecCGI AllowOverride Indexes AuthConfig FileInfo Order allow,deny Allow from all AuthName "Nagios Access" AuthType Basic AuthUSerFile /usr/local/etc/nagios/htpasswd.users Require valid-user </Directory> <Directory /usr/local/www/nagios/cgi-bin> Options +ExecCGI AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" AuthType Basic AuthUSerFile /usr/local/etc/nagios/htpasswd.users Require valid-user </Directory> I think that the problem is in idexes, maybe? When I remove the options it's public and available but lists the files and says that idnexes are forbidden..

    Read the article

  • Centos 5.xx Nagios sSMTP mail cannot be sent from nagios server, but works great from console

    - by adam
    I spent last 3 hours of reasearch on how to get nagios to work with email notifications, i need to send emails form work where the only accesible smtp server is the company's one. i managed to get it done from the console using: mail [email protected] working perfectly for the purpouse i set up ssmtp.conf so as: [email protected] mailhub=smtp.company.com:587 [email protected] AuthPass=mypassword FromLineOverride=YES useSTARTTLS=YES rewriteDomain=company.pl hostname=nagios UseTLS=YES i also edited the file /etc/ssmtp/revaliases so as: root:[email protected]:smtp.company.com:587 nagios:[email protected]:smtp.company.com:587 nagiosadmin:[email protected]:smtp.company.com:587 i also edited the file permisions for /etc/ssmtp/* so as: -rwxrwxrwx 1 root nagios 371 lis 22 15:27 /etc/ssmtp/revaliases -rwxrwxrwx 1 root nagios 1569 lis 22 17:36 /etc/ssmtp/ssmtp.conf and i assigned to proper groups i belive: cat /etc/group |grep nagios mail:x:12:mail,postfix,nagios mailnull:x:47:nagios nagios:x:2106:nagios nagcmd:x:2107:nagios when i send mail manualy, i recieve it on my priv box, but when i send mail from nagios the mail log says: Nov 22 17:47:03 certa-vm2 sSMTP[9099]: MAIL FROM:<[email protected]> Nov 22 17:47:03 certa-vm2 sSMTP[9099]: 550 You are not allowed to send mail from this address it says [email protected] and im not allowed to send mails claiming to be [email protected], its suppoused to be [email protected], what am i doing wrong? i ran out of tricks... kind regards Adam xxxx

    Read the article

  • nagios levels of escalation

    - by com
    I try to configure nagios in the following way for every service (for example "mysql seconds behind master") I need to define few levels of escalations, when level is warning I want to send only email and when level is critical I want to send email and sms . What is the right way to do this? Do be stick we the levels definition (critical or warning), if there is different way to differentiate email level and sms level of escalation? Thanks!

    Read the article

  • Nagios Woudn't Start, now won't Stop!

    - by Bart B
    I ran an update on a CentOS server running Nagios, after the update, Nagios failed to start. The error in the logs was: Failed to obtain lock on file /var/run/nagios.pid: Permission denied So, I checked and there was no pid file for Nagios in /var/run. I created one and gave it the following permissions: -rwxr--r-- 1 nagios nagios 6 May 31 11:58 nagios.pid Nagios then started and seems to be running normally. The only problem is, it refuses to stop now, so I can't re-start it to add new servers and services to be monitored! When I issue the command "service nagios stop", I get [FAILED], but nothing at all gets outputted to the log, and the service remains up. Any ideas on how I can get the service to stop now? I'm running the RPM version which was installed via yum from the RPMForge repositories. The server is CenotOS 5.5.

    Read the article

  • nconf nagios config no services defined

    - by user1508056
    I've setup Nagios core on OSX 10.7 server via macports fine. It seems to load fine and the sample config files all copied over to /opt/local/etc/nagios/objects/ fine and are specified correctly in the nagios.cfg file. I then installed nconf manually and got it running without much fight. Then I clicked on "Generate Nagios config" in nconf and get 1 warning and 4 errors. When I expand the error box here what I see: Nagios Core 3.5.0 Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors Copyright (c) 1999-2009 Ethan Galstad Last Modified: 03-15-2013 License: GPL Website: http://www.nagios.org Reading configuration data... Read main config file okay... Read object config files okay... Running pre-flight check on configuration data... Checking services... Error: There are no services defined! Checked 0 services. Checking hosts... Error: There are no hosts defined! Checked 0 hosts. Checking host groups... Checked 0 host groups. Checking service groups... Checked 0 service groups. Checking contacts... Error: There are no contacts defined! Checked 0 contacts. Checking contact groups... Checked 0 contact groups. Checking service escalations... Checked 0 service escalations. Checking service dependencies... Checked 0 service dependencies. Checking host escalations... Checked 0 host escalations. Checking host dependencies... Checked 0 host dependencies. Checking commands... Checked 0 commands. Checking time periods... Checked 0 time periods. Checking for circular paths between hosts... Checking for circular host and service dependencies... Checking global event handlers... Checking obsessive compulsive processor commands... Checking misc settings... Warning: Nothing specified for illegal_macro_output_chars variable! Total Warnings: 1 Total Errors: 3 I've tried several different things (played with cache settings, changed file permissions/ownership, edited some config files manually, etc.) but nothing gets me past this step. The thing is, when I run 'sudo nagios -v /opt/local/etc/nagios/nagios.cfg' the output shows it is reading a number of services, a localhost, and a contact in the .cfg files...so I'm pretty confident those are ok and the problem is nconf isnt reading the correct .cfg files or something like that. Any ideas what to double check? I did lots of googling and found nothing on this specific issue--so either I'm special (I'm not) or am overlooking something really simple. The path to nagios binary is listed as /opt/local/bin/nagios, if that matters. Also, all the nagios files are owned by nagios:nagios, wheras nconf files are owned by user, with only the directories/files specified in the nconf docs belonging to the _www user and/or group (things like output, temp, config, etc.). Thanks.

    Read the article

  • Nagios Check_hpjd giving me problems!

    - by Mister IT Guru
    When I run the following [root@host plugins]# ./check_hpjd -H printer1.mydomain.com : Timeout from host printer1.mydomain.com I have Net-snmp installed on my system, I noted that i didn't have net-snmp-utils installed, and then I was able to run [root@host plugins]# snmpwalk -Os -c public -v 1 printer1.mydomain.com system sysDescr.0 = STRING: HP ETHERNET MULTI-ENVIRONMENT sysObjectID.0 = OID: enterprises.11.2.3.9.1 sysUpTimeInstance = Timeticks: (325408663) 37 days, 15:54:46.63 sysContact.0 = STRING: sysName.0 = STRING: printer1 sysLocation.0 = STRING: sysServices.0 = INTEGER: 72 So I know that the printer is working as expected, (as far as SNMP is concerned). But when I run [root@host plugins]# ./check_hpjd -H printer1.mydomain.com -C public Error in packet () I get this error - From what I've tried so far, I know my host can communicate via SNMP, I know the printer responds via SNMP, so I guess I'm left to look at the plug-in, which I will be checking up on. I'm new to SNMP, I am investigating this with my good friend Google search, but I am on a learning curve here, so please forgive my questions if they sound stupid,

    Read the article

  • Dependency issue while installing Nagios plugins

    - by M. Saâd
    I have a dependency problem while installing nagios-plugins : yum install nagios-plugins-all ... --> Processing Dependency: /usr/bin/sensors for package: nagios-plugins-sensors-1.4.15-7.el6.i686 --> Finished Dependency Resolution Error: Package: nagios-plugins-sensors-1.4.15-7.el6.i686 (epel) Requires: /usr/bin/sensors You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest OS : RHEL 6.1 Installed packages : nagios.i686 3.2.3-3.el6.rf nagios-plugins.i686 1.4.15-7.el6

    Read the article

  • How to collect the performance data of a server during an unreachable/down period using Nagios?

    - by gsc-frank
    Some time services and host stop responding due to a poor server performance. I mean, if for some reason (could be lot of concurrency services access, a expensive backup execution on the server or whatever that consume tons of server resources) a server performance is very degraded, that could lead that the server isn't capable to establish any "normal network communication" (without trigger whatever standards timeouts defined for such communication). Knowing host's performance data (cpu, memory, ...) in case of available during that period (host is not down and despite of its performance degradation still allow plugins collect performance data) could be very useful for sysadmin to try to determine what cause the problem, or at least, if the host performance was good and don't interfered at all in the host/service down. This problem could be solved using remote active (NRPE) or remote passive (NSCA) if such remote solutions could store (buffered) perf data to be send to central Nagios server when host performance or network outage allow it. I read the doc of both solutions and can't find any reference to such buffer mechanism neither what happened in case that NSCA can't reach Nagios server. Any idea of how solve this lack of info? so useful for forensic analysis. EDIT: My questions isn about which tools I can use to debug perf problems or gather perf data to analysis, but is about how collect (using Nagios) host perf data even during a network outage for its posterior analysis (kind of forensic analysis). The idea is integrate such data to Nagios graphers like pnp4nagios and NagiosGrapther. I know that I could install tools like Cacti in each of my host, and have a kind of performance data collection redundancy, but I really want avoid that and try to solve all perf analysis requirements with one tools: Nagios

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >