grabbing a substring while scraping with Python2.6
Posted
by Diego
on Stack Overflow
See other posts from Stack Overflow
or by Diego
Published on 2010-05-16T22:11:53Z
Indexed on
2010/05/16
22:20 UTC
Read the original article
Hit count: 279
Hey can someone help with the following?
I'm trying to scrape a site that has the following information.. I need to pull just the number after the </strong>
tag..
[<li><strong>ISBN-13:</strong> 9780375853401</li>, <li><strong>Pub. Date: </strong> 05/11/2010</li>]
[<li><strong>UPC:</strong> 490355000372</li>, <li><strong>Catalog No:</strong> 15024/25</li>, <li><strong>Label:</strong> CAMERATA</li>]
here's a piece of the code I've been using to grab the above data using mechanize and BeautifulSoup. I'm stuck here as it won't let me use the find() function for a list
br_results = mechanize.urlopen(br_results)
html = br_results.read()
soup = BeautifulSoup(html)
local_links = soup.findAll("a", {"class" : "down-arrow csa"})
upc_code = soup.findAll("ul", {"class" : "bc-meta3"})
for upc in upc_code:
upc_text = upc.contents.contents
print upc_text
© Stack Overflow or respective owner