Nagios state transition and event handler issue
- by Dattatray
We are using Nagios to check duplicate processes.
define service
{
use local-service
host_name xxx
service_description xxx Duplicate Processes
check_interval 1
max_check_attempts 1
contact_groups admins
event_handler restart-dependent-processes
check_command check_procs_duplicate!2!3!2!2!2
}
check_procs_duplicate checks if there are any duplicate processes and returns the state - e.g. CRITICAL.
The event handler kills the duplicate processes and it's dependent processes and starts one instance of the process and dependent process.
At the end of this again Nagios checks if there are any duplicate processes and sets the state accordingly - OK/WARNING/CRITICAL.
The event handler takes more time to start the processes and during this time if someone manually starts the process, the state will remain in CRITICAL itself.
During the next interval, Nagios will again check for duplicate processes and it will find it again CRITICAL.
The event handler will not get executed now, as the previos and current both the states are CRITICAL.
Any pointers about how to fix this issue?