Why does Cacti keep waiting for dead poller processes?
Posted
by
Oliver Salzburg
on Server Fault
See other posts from Server Fault
or by Oliver Salzburg
Published on 2012-06-29T20:26:55Z
Indexed on
2012/06/29
21:18 UTC
Read the original article
Hit count: 2374
sorry for the length
I am currently setting up a new Debian (6.0.5) server. I put Cacti (0.8.7g) on it yesterday and have been battling with it ever since.
Initial issue
The initial issue I was observing, was that my graphs weren't updating. So I checked my cacti.log
and found this concerning message:
POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
That can't be good, right? So I went checking and started poller.php
myself (via sudo -u www-data php poller.php --force
). It will pump out a lot of message (which all look like what I would expect) and then hang for a minute. After that 1 minute, it will loop the following message:
Waiting on 1 of 1 pollers.
This goes on for 4 more minutes until the process is forcefully ended for running longer than 300s.
So far so good
I went on for a good hour trying to determine what poller might still be running, until I got to the conclusion that there simply is no running poller.
Debugging
I checked poller.php
to see how that warning is issued and why. On line 368, Cacti will retrieve the number of finished processes from the database and use that value to calculate how many processes are still running. So, let's see that value!
I added the following debug code into poller.php
:
print "Finished: " . $finished_processes . " - Started: " . $started_processes . "\n";
Result
This will print the following within seconds of starting poller.php
:
Finished: 0 - Started: 1
Waiting on 1 of 1 pollers.
Finished: 1 - Started: 1
So the values are being read and are valid. Until we get to the part where it keeps looping:
Finished: - Started: 1
Waiting on 1 of 1 pollers.
Suddenly, the value is gone. Why? Putting var_dump()
in there confirms the issue:
NULL
Finished: - Started: 1
Waiting on 1 of 1 pollers.
The return value is NULL
. How can that be when querying SELECT COUNT()...
? (SELECT COUNT()
should always return one result row, shouldn't it?)
More debugging
So I went into lib\database.php
and had a look at that db_fetch_cell()
. A bit of testing confirmed, that the result set is actually empty.
So I added my own database query code in there to see what that would do:
$finished_processes = db_fetch_cell("SELECT count(*) FROM poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'");
print "Finished: " . $finished_processes . " - Started: " . $started_processes . "\n";
$mysqli = new mysqli("localhost","cacti","cacti","cacti");
$result = $mysqli->query("SELECT COUNT(*) FROM poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00';");
$row = $result->fetch_assoc();
var_dump( $row );
This will output
Finished: - Started: 1
array(1) {
["COUNT(*)"]=>
string(1) "2"
}
Waiting on 1 of 1 pollers.
So, the data is there and can be accessed without any problems, just not with the method Cacti is using?
Double-check that!
I enabled MySQL logging to make sure I'm not imagining things. Sure enough, when the error message is looped, the cacti.log
reads as if it was querying like mad:
06/29/2012 08:44:00 PM - CMDPHP: Poller[0] DEVEL: SQL Cell: "SELECT count(*) FROM cacti.poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'"
06/29/2012 08:44:01 PM - CMDPHP: Poller[0] DEVEL: SQL Cell: "SELECT count(*) FROM cacti.poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'"
06/29/2012 08:44:02 PM - CMDPHP: Poller[0] DEVEL: SQL Cell: "SELECT count(*) FROM cacti.poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'"
But none of these queries are logged my MySQL. Yet, when I add my own database query code, it shows up just fine.
What the heck is going on here?
© Server Fault or respective owner