visual analysis of web pages in ruby

Posted by Clint Miller on Stack Overflow See other posts from Stack Overflow or by Clint Miller
Published on 2011-01-06T18:39:32Z Indexed on 2011/01/06 18:54 UTC
Read the original article Hit count: 209

Filed under:
|
|
|
|

I'm looking to write some code that does visual analysis of web pages, preferably using Ruby. My code will need to be able to determine the top, left, width, height, background color, color, and font size for all the elements in the DOM. Of course, these values can only be calculated once all CSS is applied. So, I don't think that Nokogiri is up for the job. Ultimately, I'm trying to use this data in a VIPS-like (Vision-Based Page Segmentation) algorithm in an attempt to find the main content in downloaded news articles.

I've considered using Watir to drive Chrome or Firefox and then extract the data. The problem is that browsers can't be run headless through Watir (I think). Ultimately, this code will be running on an array of Linux servers in a data center. So, the code won't have easy access to an X Server for displaying the browser.

I suppose one solution is to use Watir and run a headless X Server on the Linux servers. That's a bit of a pain, but it looks like my best option right now.

Does anyone have any better ideas?

© Stack Overflow or respective owner

Related posts about html

Related posts about ruby