Cause of flapping UNKNOWN Nagios status?
Posted
by jldugger
on Server Fault
See other posts from Server Fault
or by jldugger
Published on 2009-08-25T19:48:31Z
Indexed on
2010/03/29
1:03 UTC
Read the original article
Hit count: 518
We run some Nagios service checks via OpsView, and one of our hosts is getting a strange response for SSH:
"UNKNOWN: Service results are stale"
It happens regularly, but seems to go away as the system retries a 2nd and 3rd time. It started after a patch and reboot of the server in question last week. The system itself responds to SSH from boxes I've tested with (which doesn't include the monitoring system I am not given access to).
/var/log/secure is full of lines ala:
sshd[15628]: Did not receive identification string from xxx.xxx.226.20
Time stamps are reliably every five minutes, which is pretty obviously the monitoring script disconnecting once it gets a login prompt.
Anyone know what might be causing this, or how to fix it? It's really frustrating to see this pop on and off the status page.
© Server Fault or respective owner