RUBY Nokogiri CSS HTML Parsing
Posted
by user296507
on Stack Overflow
See other posts from Stack Overflow
or by user296507
Published on 2010-03-18T15:16:21Z
Indexed on
2010/03/18
15:21 UTC
Read the original article
Hit count: 824
I'm having some problems trying to get the code below to output the data in the format that I want. What I'm after is the following:
CCC1-$5.00
CCC1-$10.00
CCC1-$15.00
CCC2-$7.00
where $7 belongs to CCC2 and the others to CCC1, but I can only manage to get the data in this format:
CCC1-$5.00
CCC1-$10.00
CCC1-$15.00
CCC1-$7.00
CCC2-$5.00
CCC2-$10.00
CCC2-$15.00
CCC2-$7.00
Any help would be appreciated.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML.parse(<<-eohtml)
<div class="AAA">
<table cellspacing="0" cellpadding="0" border="0" summary="sum">
<tbody>
<tr>
<td class="BBB">
<span class="CCC">CCC1</span>
</td>
<td class="DDD">
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr><td class="FFF">$5.00</td></tr>
<tr><td class="FFF">$10.00</td></tr>
<tr><td class="FFF">$15.00</td></tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table cellspacing="0" cellpadding="0" border="0" summary="sum">
<tbody>
<tr>
<td class="BBB">
<span class="CCC">CCC2</span>
</td>
<td class="DDD">
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr><td class="FFF">$7.00</td></tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
eohtml
doc.css('td.BBB > span.CCC').each do |something|
doc.css('tr > td.EEE, tr > td.FFF').each do |something_more|
puts something.content + '-'+ something_more.content
end
end
© Stack Overflow or respective owner