Nagios core Event Handler not working
Posted
by
sivashanmugam
on Server Fault
See other posts from Server Fault
or by sivashanmugam
Published on 2014-05-30T12:37:14Z
Indexed on
2014/06/02
15:31 UTC
Read the original article
Hit count: 270
Nagios Event Handler is not triggering when the service is taking more time to response or down.
My configuration in below
nagios.cfg
enable_event_handlers=1
localhost.cfg
define service {
use generic-service host_name Server service_description test-server servicegroups test-service check_command check-service is_volatile 0 check_period 24x7 max_check_attempts 4 normal_check_interval 2 retry_check_interval 2 contact_groups testcontacts notification_period 24x7 notification_options w,u,c,r notifications_enabled 1 event_handler_enabled 1 event_handler recheck-service }
command.cfg
define command{ command_name recheck-service command_line /usr/local/nagios/libexec/alert.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ }
alert.sh file
!/bin/sh set -x case "$1" in
OK)
# The service just came back up, so don't do anything...
;;
WARNING)
# We don't really care about warning states, since the service is probably still running...
;; UNKNOWN) # We don't know what might be causing an unknown error, so don't do anything... ;; CRITICAL) Aha! The HTTP service appears to have a problem - perhaps we should restart the server... Is this a "soft" or a "hard" state? case "$2" in We're in a "soft" state, meaning that Nagios is in the middle of retrying the check before it turns into a "hard" state and contacts get notified... SOFT) # What check attempt are we on? We don't want to restart the web server on the first check, because it may just be a fluke! case "$3" in Wait until the check has been tried 3 times before restarting the web server. If the check fails on the 4th time (after we restart the web server), the state type will turn to "hard" and contacts will be notified of the problem. Hopefully this will restart the web server successfully, so the 4th check will result in a "soft" recovery. If that happens no one gets notified because we fixed the problem! 3) echo -n "Going To Ping the Virtual Machine (3rd soft critical state)..." # Call the init script to restart the HTTPD server myresult=`/usr/local/nagios/libexec/check_http xyz.com -t 100 | grep 'time'| awk '{print $10}'` echo "Your Service Is taking the following time Delay" "$myresult Seconds" |mail -s "WARNING : Service Taken More Time To Response" [email protected] ;; esac ;; # The HTTP service somehow managed to turn into a hard error without getting fixed. # It should have been restarted by the code above, but for some reason it didn't. # Let's give it one last try, shall we? # Note: Contacts have already been notified of a problem with the service at this
© Server Fault or respective owner