Parsing with BeautifulSoup, error message TypeError: coercing to Unicode: need string or buffer, NoneType found

Posted by Samsun Knight on Stack Overflow See other posts from Stack Overflow or by Samsun Knight
Published on 2013-06-29T16:18:03Z Indexed on 2013/06/29 16:21 UTC
Read the original article Hit count: 794

so I'm trying to scrape an Amazon page for data, and I'm getting an error when I try to parse for where the seller is located. Here's my code:

#getting the html
request = urllib2.Request('http://www.amazon.com/gp/offer-listing/0393934241/')
opener = urllib2.build_opener()
#hiding that I'm a webscraper
request.add_header('User-Agent', 'Mozilla/5 (Solaris 10) Gecko')
#opening it up, putting into soup form
html = opener.open(request).read()
soup = BeautifulSoup(html, "html5lib")

#parsing for the seller info
sellers = soup.findAll('div', {'class' : 'a-row a-spacing-medium olpOffer'})
for eachseller in sellers:
    #parsing for price
    price = eachseller.find('span', {'class' : 'a-size-large a-color-price olpOfferPrice a-text-bold'})
    #parsing for shipping costs
    shippingprice = eachseller.find('span'
    , {'class' : 'olpShippingPrice'})
    #parsing for condition
    condition = eachseller.find('span', {'class' : 'a-size-medium'})
    #parsing for seller name
    sellername = eachseller.find('b')
     #parsing for seller location
    location = eachseller.find('div', {'class' : 'olpAvailability'})

    #printing it all out
    print "price, " + price.string + ", shipping price, " + shippingprice.string + ", condition," + condition.string + ", seller name, " + sellername.string + ", location, " + location.string

I get the error message, pertaining to the 'print' command at the end, "TypeError: coercing to Unicode: need string or buffer, NoneType found"

I know that it's coming from this line - location = eachseller.find('div', {'class' : 'olpAvailability'}) - because the code works fine without that line, and I know that I'm getting NoneType because the line isn't finding anything. Here's the html from the section I'm looking to parse:

<*div class="olpAvailability">
    In Stock. 
        Ships from WI, United States.
    <*br/><*a href="/gp/aag/details/ref=olp_merch_ship_9/175-0430757-3801038?ie=UTF8&amp;asin=0393934241&amp;seller=A1W2IX7T37FAMZ&amp;sshmPath=shipping-rates#aag_shipping">Domestic shipping rates</a>
         and <*a href="/gp/aag/details/ref=olp_merch_return_9/175-0430757-3801038?ie=UTF8&amp;asin=0393934241&amp;seller=A1W2IX7T37FAMZ&amp;sshmPath=returns#aag_returns">return policy</a>.
<*/div>

(but without the stars - just making sure the HTML doesn't compile out of code form)

I don't see what's the problem with the 'location' line of code, or why it's not pulling the data I want. Help?

© Stack Overflow or respective owner

Related posts about python

Related posts about web-scraping