Capturing Drupal7 DOM content before page load for comparison
- by ehime
We have an MU (Multisite) installation of Drupal7 here at work, and are trying to
temporarily hold back the swarm of bots we receive until we get a chance to load our
content. I wrote a quick and and dirty script to send 503 headers if we find a certain
criteria in Xpath (This can ALSO be done as a strpos/preg_match if DOM is not formed).
In order to get the ball rolling though I need to figure out how to either
A) Hijack the Drupal7 bootstrap and pull all content through this filter below
B) ob_flush content through the filter before content is loaded
The issue that I am having is figuring out exactly where I can catch the content
at? I thought that index.php in Drupal7 would be the suspect, but I'm a little
confused as to where or how I should capture the contents. Here's the script,
and hopefully someone can point me in the right direction.
//error_reporting(-1);
/* start query */
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->Load($_SERVER['PHP_SELF']);
$xpath = new DOMXPath($dom);
//if this exists we aren't ready to be read by bots
$query = $xpath->query(".//*[@id='block-views-about-this-site-block']/div/div/div");
//or $query = 'klat-badge'; //if this is a string not DOM
/* end query */
if(strpos($query) !== false) {
//require banlist
require('botlist.php');
$str = strtolower('/'.implode('|', array_unique($list)).'/i');
if(preg_match($str, strtolower($_SERVER['HTTP_USER_AGENT']))) {
//so tell bots we're broken
header('HTTP/1.1 503 Service Temporarily Unavailable');
header('Status: 503 Service Temporarily Unavailable');
exit;
}
}