Why does Cacti keep waiting for dead poller processes?

Posted by Oliver Salzburg on Server Fault See other posts from Server Fault or by Oliver Salzburg
Published on 2012-06-29T20:26:55Z Indexed on 2012/06/29 21:18 UTC
Read the original article Hit count: 2374

Filed under:
|
|

sorry for the length
I am currently setting up a new Debian (6.0.5) server. I put Cacti (0.8.7g) on it yesterday and have been battling with it ever since.

Initial issue

The initial issue I was observing, was that my graphs weren't updating. So I checked my cacti.log and found this concerning message:

POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.

That can't be good, right? So I went checking and started poller.php myself (via sudo -u www-data php poller.php --force). It will pump out a lot of message (which all look like what I would expect) and then hang for a minute. After that 1 minute, it will loop the following message:

Waiting on 1 of 1 pollers.

This goes on for 4 more minutes until the process is forcefully ended for running longer than 300s.

So far so good

I went on for a good hour trying to determine what poller might still be running, until I got to the conclusion that there simply is no running poller.

Debugging

I checked poller.php to see how that warning is issued and why. On line 368, Cacti will retrieve the number of finished processes from the database and use that value to calculate how many processes are still running. So, let's see that value!

I added the following debug code into poller.php:

print "Finished: " . $finished_processes . " - Started: " . $started_processes . "\n";

Result

This will print the following within seconds of starting poller.php:

Finished: 0 - Started: 1
Waiting on 1 of 1 pollers.
Finished: 1 - Started: 1

So the values are being read and are valid. Until we get to the part where it keeps looping:

Finished:  - Started: 1
Waiting on 1 of 1 pollers.

Suddenly, the value is gone. Why? Putting var_dump() in there confirms the issue:

NULL
Finished:  - Started: 1
Waiting on 1 of 1 pollers.

The return value is NULL. How can that be when querying SELECT COUNT()...? (SELECT COUNT() should always return one result row, shouldn't it?)

More debugging

So I went into lib\database.php and had a look at that db_fetch_cell(). A bit of testing confirmed, that the result set is actually empty.

So I added my own database query code in there to see what that would do:

$finished_processes = db_fetch_cell("SELECT count(*) FROM poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'");
print "Finished: " . $finished_processes . " - Started: " . $started_processes . "\n";

$mysqli = new mysqli("localhost","cacti","cacti","cacti");
$result = $mysqli->query("SELECT COUNT(*) FROM poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00';");
$row = $result->fetch_assoc();
var_dump( $row );

This will output

Finished:  - Started: 1
array(1) {
  ["COUNT(*)"]=>
  string(1) "2"
}
Waiting on 1 of 1 pollers.

So, the data is there and can be accessed without any problems, just not with the method Cacti is using?

Double-check that!

I enabled MySQL logging to make sure I'm not imagining things. Sure enough, when the error message is looped, the cacti.log reads as if it was querying like mad:

06/29/2012 08:44:00 PM - CMDPHP: Poller[0] DEVEL: SQL Cell: "SELECT count(*) FROM cacti.poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'"
06/29/2012 08:44:01 PM - CMDPHP: Poller[0] DEVEL: SQL Cell: "SELECT count(*) FROM cacti.poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'"
06/29/2012 08:44:02 PM - CMDPHP: Poller[0] DEVEL: SQL Cell: "SELECT count(*) FROM cacti.poller_time WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'"

But none of these queries are logged my MySQL. Yet, when I add my own database query code, it shows up just fine.

What the heck is going on here?

© Server Fault or respective owner

Related posts about php

Related posts about debian-squeeze