BeautifulSoup Parser Confusion - HTML
Posted
by
lyngbym
on Stack Overflow
See other posts from Stack Overflow
or by lyngbym
Published on 2011-01-08T20:50:24Z
Indexed on
2011/01/08
20:53 UTC
Read the original article
Hit count: 189
beautifulsoup
I'm trying to scrape some content off another site and I'm not sure why BeautifulSoup is producing this output. It is only finding a blank space inside the match, but the real HTML contains a large amount of markup. I apologize if this is something stupid on my part. I'm new to python.
Here's my code:
import sys
import os
import mechanize
import re
from BeautifulSoup import BeautifulSoup
def scrape_trails(BASE_URL, data):
#Get the trail names
soup = BeautifulSoup(data)
sitesDiv = soup.findAll("div", attrs={"id" : "sitesDiv"})
print sitesDiv
def main():
BASE_URL = "http://www.dnr.state.mn.us/skiing/skipass/list.html"
br = mechanize.Browser()
data = br.open(BASE_URL).get_data()
links = scrape_trails(BASE_URL, data)
if __name__ == '__main__':
main()
If you follow that URL you can see the sitesDiv contains a lot of markup. I'm not sure if I'm doing something wrong or if this is just malformed markup that the script can't handle. Thanks!
© Stack Overflow or respective owner