BeautifulSoup Parser Confusion - HTML

Posted by lyngbym on Stack Overflow See other posts from Stack Overflow or by lyngbym
Published on 2011-01-08T20:50:24Z Indexed on 2011/01/08 20:53 UTC
Read the original article Hit count: 189

Filed under:

I'm trying to scrape some content off another site and I'm not sure why BeautifulSoup is producing this output. It is only finding a blank space inside the match, but the real HTML contains a large amount of markup. I apologize if this is something stupid on my part. I'm new to python.

Here's my code:

import sys
import os
import mechanize
import re
from BeautifulSoup import BeautifulSoup

def scrape_trails(BASE_URL, data):
    #Get the trail names
    soup = BeautifulSoup(data)
    sitesDiv = soup.findAll("div", attrs={"id" : "sitesDiv"})
    print sitesDiv


def main():
    BASE_URL = "http://www.dnr.state.mn.us/skiing/skipass/list.html"
    br = mechanize.Browser()
    data = br.open(BASE_URL).get_data()
    links = scrape_trails(BASE_URL, data)


if __name__ == '__main__':
    main()

If you follow that URL you can see the sitesDiv contains a lot of markup. I'm not sure if I'm doing something wrong or if this is just malformed markup that the script can't handle. Thanks!

© Stack Overflow or respective owner

Related posts about beautifulsoup