Cause of flapping UNKNOWN Nagios status?

Posted by jldugger on Server Fault See other posts from Server Fault or by jldugger
Published on 2009-08-25T19:48:31Z Indexed on 2010/03/29 1:03 UTC
Read the original article Hit count: 518

Filed under:
|
|
|

We run some Nagios service checks via OpsView, and one of our hosts is getting a strange response for SSH:

"UNKNOWN: Service results are stale"

It happens regularly, but seems to go away as the system retries a 2nd and 3rd time. It started after a patch and reboot of the server in question last week. The system itself responds to SSH from boxes I've tested with (which doesn't include the monitoring system I am not given access to).

/var/log/secure is full of lines ala:

sshd[15628]: Did not receive identification string from xxx.xxx.226.20

Time stamps are reliably every five minutes, which is pretty obviously the monitoring script disconnecting once it gets a login prompt.

Anyone know what might be causing this, or how to fix it? It's really frustrating to see this pop on and off the status page.

© Server Fault or respective owner

Related posts about ssh

Related posts about nagios