nagios service check
- by DRH
I am new to nagios and we have a small issue I need to ask assistance with. Many of the machines that we monitor can go unresponsive for a bit when some very intensive cpu tasks are run. This makes nagios send warnings and alerts while these hosts are busy reporting things like 'ping timeout' or 'zombie processes' and even swap space warnings, but in actuality there is not a problem.
Is there a way to configure nagios to not send such alerts, but check x number of times over a period of time and only then send an alert at the end of that time if the server in question has not recovered?.
Looking at the commands.cfg file, I see entries like this:
define command{
command_name check_local_swap
command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
}
How could I modify this example to accomplish what I want above.
Thanks