process all links but external ones (ruby + mechanize)
- by Radek
I want to process all links but external ones from the whole web site. Is there any easy way how to identify that the link is external and skip it?
My code looks so far like (the site url is passed through command line argument
require 'mechanize'
def process_page(page)
puts
puts page.title
STDIN.gets
page.links.each do |link|
process_page($agent.get(link.href))
end
end
$agent = WWW::Mechanize.new
$agent.user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4'
process_page($agent.get(ARGV[0]))