Does urllib2.urlopen() actually fetch the page?
Posted
by beagleguy
on Stack Overflow
See other posts from Stack Overflow
or by beagleguy
Published on 2010-06-09T19:22:26Z
Indexed on
2010/06/09
20:02 UTC
Read the original article
Hit count: 263
hi all, I was condering when I use urllib2.urlopen() does it just to header reads or does it actually bring back the entire webpage?
IE does the HTML page actually get fetch on the urlopen call or the read() call?
handle = urllib2.urlopen(url)
html = handle.read()
The reason I ask is for this workflow...
- I have a list of urls (some of them with short url services)
- I only want to read the webpage if I haven't seen that url before
- I need to call urlopen() and use geturl() to get the final page that link goes to (after the 302 redirects) so I know if I've crawled it yet or not.
- I don't want to incur the overhead of having to grab the html if I've already parsed that page.
thanks!
© Stack Overflow or respective owner