PHP robots.txt parsing

Posted by omfgroflmao on Stack Overflow See other posts from Stack Overflow or by omfgroflmao
Published on 2010-03-30T19:42:16Z Indexed on 2010/03/30 19:43 UTC
Read the original article Hit count: 391

Filed under:
|
|

Is there an easiest way to do this?

function parse_robots_txt($URL){
    $parsed = parse_url($URL);
    $robots = file_get_contents('http://'.$parsed['host'].'/robots.txt',FILE_TEXT);
    $exploded = explode('user-agent:',strtolower($robots));
    foreach($exploded as $user_agent){
        $user_agent = trim($user_agent);
        if(substr($user_agent,0,1) == '*'){
            $user_agent = str_replace('#','',preg_replace('/#.*\\n/i','',$user_agent));
            $user_agent = str_replace('disallow:','',substr($user_agent,1));
            $user_agent = preg_replace('/allow:/i', '+-+-+-+', $user_agent, 1);
            $user_agent = str_replace('allow:','',$user_agent);
            print_r(explode('+-+-+-+',$user_agent));
        }
    }
}

© Stack Overflow or respective owner

Related posts about php

Related posts about robots.txt