nagios wrongly reports packet loss
- by Alien Life Form
Lately, on my nagios 3.2.3 install (CentOS5, monitoring ~ 300 hosts, 1150 services) has sdtarted to occasionally report high packet loss on 50-60 hosts at a time. Problem is it's bogus. Manual runs of ping (or its own check_ping binary) finds no fault with any of the affected hosts. The only possible cures I found so far are:
run all the checks manually (they will succeed but it may act up again on next check)
acknowledge and wait for the problem to go away (may take several ours)
I suspect (but have no particular reason other than single rescheduled checks succeeding) that the problem may lay with all the checks being mass scheduled together - in which case introducing some jitter in the scheduling (how?) might help. Or it may be something completely different.
Ideas, anyone?