Cacti: "An internal Net-Snmp error condition detected in Cacti snmp_count"
- by Recc
There's the odd forum topic about an error similarly obscure as this, but I haven't seen any for snmp_count in particular.
Also I don't see graphing problems, though I can't simply go and eyeball all graphs.
However the poller does time out and has to be stopped by its internal process preventing overruns. If I filter out the flood of this error in the log I dont get anything else except the poller timeout:
06/12/2014 12:48:00 PM - POLLER: Poller[0] Maximum runtime of 58 seconds exceeded. Exiting.
06/12/2014 12:48:00 PM - SYSTEM STATS: Time:58.8566 Method:spine Processes:1 Threads:40 Hosts:1923 HostsPerProcess:1923 DataSources:61584 RRDsProcessed:0
06/12/2014 12:48:00 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
I saw in the running processes /usr/local/spine/spine 0 2053 that's always left behind. When I kill it the flooding of the error stops. Of course it's the same on the next poll run as it goes through the devices.
2053 is apparently the DB ID for a device. I deleted it completely to see if that stops it. It doesn't, instead 2052 is seen there. I suspect It'll be the same if I keep deleting devices which I will not do.
This started happening midday when I wasn't doing anything to the cacti server.
I have tried reducing Maximum Threads per Process to 1 and Number of PHP Script Servers to 1. I've been running it at 10 script servers / 40 threads for months with poll cycle time of about 20 sec.
I just found out Running snmpwalk on any host would begin returning the values but then timeout halfway through. This doesn't happen from different servers on the network this Cacti is suggesting still that it's a problem with it locally. Any suggestions?
For one polling cycle I changed to use cmd.php instead. then I started getting errors like
CMDPHP: Poller[0] Host[45] DS[541] WARNING: Result from SNMP not valid. Partial Result: U
Perhaps as expected. Looking closely I see that every snmpwalk I do is interrupted at the same place as if some byte limit is hit and the connection torn down.