Search Results

Search found 159 results on 7 pages for 'urllib2'.

Page 1/7 | 1 2 3 4 5 6 7 | Next Page >

In Python, urllib2 giving error

- by pythBegin

I tried running this, >>> urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl') But it is giving error like this, can anyone tell me a solution ? Traceback (most recent call last): File "<pyshell#11>", line 1, in <module> urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl') File "C:\Python26\lib\urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "C:\Python26\lib\urllib2.py", line 391, in open response = self._open(req, data) File "C:\Python26\lib\urllib2.py", line 409, in _open '_open', req) File "C:\Python26\lib\urllib2.py", line 369, in _call_chain result = func(*args) File "C:\Python26\lib\urllib2.py", line 1161, in http_open return self.do_open(httplib.HTTPConnection, req) File "C:\Python26\lib\urllib2.py", line 1136, in do_open raise URLError(err) URLError: <urlopen error [Errno 11001] getaddrinfo failed>

Read the article
Python urllib2 Basic Auth Problem

- by Simon

I'm having a problem sending basic AUTH over urllib2. I took a look at this article, and followed the example. My code: passman = urllib2.HTTPPasswordMgrWithDefaultRealm() passman.add_password(None, "api.foursquare.com", username, password) urllib2.install_opener(urllib2.build_opener(urllib2.HTTPBasicAuthHandler(passman))) req = urllib2.Request("http://api.foursquare.com/v1/user") f = urllib2.urlopen(req) data = f.read() I'm seeing the following on the Wire via wireshark: GET /v1/user HTTP/1.1 Host: api.foursquare.com Connection: close Accept-Encoding: gzip User-Agent: Python-urllib/2.5 You can see the Authorization is not sent, vs. when I send a request via curl: curl -u user:password http://api.foursquare.com/v1/user GET /v1/user HTTP/1.1 Authorization: Basic =SNIP= User-Agent: curl/7.19.4 (universal-apple-darwin10.0) libcurl/7.19.4 OpenSSL/0.9.8k zlib/1.2.3 Host: api.foursquare.com Accept: */* For some reason my code seems to not send the authentication - anyone see what I'm missing? thanks -simon

Read the article
Repeated host lookups failing in urllib2

- by reve_etrange

I have code which issues many HTTP GET requests using Python's urllib2, in several threads, writing the responses into files (one per thread). During execution, it looks like many of the host lookups fail (causing a name or service unknown error, see appended error log for an example). Is this due to a flaky DNS service? Is it bad practice to rely on DNS caching, if the host name isn't changing? I.e. should a single lookup's result be passed into the urlopen? Exception in thread Thread-16: Traceback (most recent call last): File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner self.run() File "/home/da/local/bin/ThreadedDownloader.py", line 61, in run page = urllib2.urlopen(url) # get the page File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib/python2.6/urllib2.py", line 1170, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.6/urllib2.py", line 1145, in do_open raise URLError(err) URLError: <urlopen error [Errno -2] Name or service not known>

Read the article
Python HTTPS requests (urllib2) fails on Ubuntu 12.04 without proxy

- by Pablo

I have an little app I wrote in Python and it used to work... until yesterday, when it suddenly started giving me an error in a HTTPS connection. I don't remember if there was an update, but both Python 2.7.3rc2 and Python 3.2 are failing just the same. I googled it and found out that this happens when people are behind a proxy, but I'm not (and nothing have changed in my network since the last time it worked). My syster's computer running windows and Python 2.7.2 has no problems (in the same network). response = urllib2.urlopen(url).read() File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 400, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 418, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open return self.do_open(httplib.HTTPSConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open raise URLError(err) urllib2.URLError: <urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol> What's wrong? Any help is appreciated. PS.: Older python versions don't work either, not in my system and not in a live session from USB, but DO work in a Ubuntu 11.10 live session.

Read the article
Python form POST using urllib2 (also question on saving/using cookies)

- by morpheous

I am trying to write a function to post form data and save returned cookie info in a file so that the next time the page is visited, the cookie information is sent to the server (i.e. normal browser behavior). I wrote this relatively easily in C++ using curlib, but have spent almost an entire day trying to write this in Python, using urllib2 - and still no success. This is what I have so far: import urllib, urllib2 import logging # the path and filename to save your cookies in COOKIEFILE = 'cookies.lwp' cj = None ClientCookie = None cookielib = None logger = logging.getLogger(__name__) # Let's see if cookielib is available try: import cookielib except ImportError: logger.debug('importing cookielib failed. Trying ClientCookie') try: import ClientCookie except ImportError: logger.debug('ClientCookie isn\'t available either') urlopen = urllib2.urlopen Request = urllib2.Request else: logger.debug('imported ClientCookie succesfully') urlopen = ClientCookie.urlopen Request = ClientCookie.Request cj = ClientCookie.LWPCookieJar() else: logger.debug('Successfully imported cookielib') urlopen = urllib2.urlopen Request = urllib2.Request # This is a subclass of FileCookieJar # that has useful load and save methods cj = cookielib.LWPCookieJar() login_params = {'name': 'anon', 'password': 'pass' } def login(theurl, login_params): init_cookies(); data = urllib.urlencode(login_params) txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} try: # create a request object req = Request(theurl, data, txheaders) # and open it to return a handle on the url handle = urlopen(req) except IOError, e: log.debug('Failed to open "%s".' % theurl) if hasattr(e, 'code'): log.debug('Failed with error code - %s.' % e.code) elif hasattr(e, 'reason'): log.debug("The error object has the following 'reason' attribute :"+e.reason) sys.exit() else: if cj is None: log.debug('We don\'t have a cookie library available - sorry.') else: print 'These are the cookies we have received so far :' for index, cookie in enumerate(cj): print index, ' : ', cookie # save the cookies again cj.save(COOKIEFILE) #return the data return handle.read() # FIXME: I need to fix this so that it takes into account any cookie data we may have stored def get_page(*args, **query): if len(args) != 1: raise ValueError( "post_page() takes exactly 1 argument (%d given)" % len(args) ) url = args[0] query = urllib.urlencode(list(query.iteritems())) if not url.endswith('/') and query: url += '/' if query: url += "?" + query resource = urllib.urlopen(url) logger.debug('GET url "%s" => "%s", code %d' % (url, resource.url, resource.code)) return resource.read() When I attempt to log in, I pass the correct username and pwd,. yet the login fails, and no cookie data is saved. My two questions are: can anyone see whats wrong with the login() function, and how may I fix it? how may I modify the get_page() function to make use of any cookie info I have saved ?

Read the article
Facebook publish HTTP Error 400 : bad request

- by Abhishek

Hey I am trying to publish a score to Facebook through python's urllib2 library. import urllib2,urllib url = "https://graph.facebook.com/USER_ID/scores" data = {} data['score']=SCORE data['access_token']='APP_ACCESS_TOKEN' data_encode = urllib.urlencode(data) request = urllib2.Request(url, data_encode) response = urllib2.urlopen(request) responseAsString = response.read() I am getting this error: response = urllib2.urlopen(request) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 389, in open response = meth(req, response) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 427, in error return self._call_chain(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 361, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 510, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 400: Bad Request Not sure if this is relating to Facebook's Open Graph or improper urllib2 API use.

Read the article
urllib2.Request() with data returns empty url

- by Mr. Polywhirl

My main concern is the function: getUrlAndHtml() If I manually build and append the query to the end of the uri, I can get the response.url(), but if I pass a dictionary as the request data, the url does not come back. Is there anyway to guarantee the redirected url? In my example below, if thisWorks = True I get back a url, but the returned url is the request url as opposed to a redirect link. On a sidenote, the encoding for .E2.80.93 does not translate to - for some reason? #!/usr/bin/python import pprint import urllib import urllib2 from bs4 import BeautifulSoup from sys import argv URL = 'http://en.wikipedia.org/w/index.php?' def yesOrNo(boolVal): return 'yes' if boolVal else 'no' def getTitleFromRaw(page): return page.strip().replace(' ', '_') def getUrlAndHtml(title, printable=False): thisWorks = False if thisWorks: query = 'title={:s}&printable={:s}'.format(title, yesOrNo(printable)) opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] response = opener.open(URL + query) else: params = {'title':title,'printable':yesOrNo(printable)} data = urllib.urlencode(params) headers = {'User-agent':'Mozilla/5.0'}; request = urllib2.Request(URL, data, headers) response = urllib2.urlopen(request) return response.geturl(), response.read() def getSoup(html, name=None, attrs=None): soup = BeautifulSoup(html) if name is None: return None return soup.find(name, attrs) def setTitle(soup, newTitle): title = soup.find('div', {'id':'toctitle'}) h2 = title.find('h2') h2.contents[0].replaceWith('{:s} for {:s}'.format(h2.getText(), newTitle)) def updateLinks(soup, url): fragment = '#' for a in soup.findAll('a', href=True): a['href'] = a['href'].replace(fragment, url + fragment) def writeToFile(soup, filename='out.html', indentLevel=2): with open(filename, 'wt') as out: pp = pprint.PrettyPrinter(indent=indentLevel, stream=out) pp.pprint(soup) print('Wrote {:s} successfully.'.format(filename)) if __name__ == '__main__': def exitPgrm(): print('usage: {:s} "<PAGE>" <FILE>'.format(argv[0])) exit(0) if len(argv) == 2: help = argv[1] if help == '-h' or help == '--help': exitPgrm() if False:''' if not len(argv) == 3: exitPgrm() ''' page = 'Led Zeppelin' # argv[1] filename = 'test.html' # argv[2] title = getTitleFromRaw(page) url, html = getUrlAndHtml(title) soup = getSoup(html, 'div', {'id':'toc'}) setTitle(soup, page) updateLinks(soup, url) writeToFile(soup, filename)

Read the article
Python - urllib2 & cookielib

- by Adrian

I am trying to open the following website and retrieve the initial cookie and use it for the second url-open BUT if you run the following code it outputs 2 different cookies. How do I use the initial cookie for the second url-open? import cookielib, urllib2 cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) home = opener.open('https://www.idcourts.us/repository/start.do') print cj search = opener.open('https://www.idcourts.us/repository/partySearch.do') print cj Output shows 2 different cookies every time as you can see: <cookielib.CookieJar[<Cookie JSESSIONID=0DEEE8331DE7D0DFDC22E860E065085F for www.idcourts.us/repository>]> <cookielib.CookieJar[<Cookie JSESSIONID=E01C2BE8323632A32DA467F8A9B22A51 for www.idcourts.us/repository>]>

Read the article
urllib2 multiple Set-Cookie headers in response

- by Richard

I am using urllib2 to interact with a website that sends back multiple Set-Cookie headers. However the response header dictionary only contains one - seems the duplicate keys are overriding each other. Is there a way to access duplicate headers with urllib2?

Read the article
Handling urllib2's timeout? - Python

- by RadiantHex

Hi folks, I'm using the timeout parameter within the urllib2's urlopen. urllib2.urlopen('http://www.example.org', timeout=1) How do I tell Python that if the timeout expires a custom error should be raised? Any ideas?

Read the article
Python urllib2 multiple try statement on urlopen()

- by Kura

So, simply I want to be able to run a for across a list of URLs, if one fails then I want to continue on to try the next. I've tried using the following but sadly it throws and exception if the first URL doesn't work. servers = ('http://www.google.com', 'http://www.stackoverflow.com') for server in servers: try: u = urllib2.urlopen(server) except urllib2.URLError: continue else: break else: raise Any ideas? Thanks in advance.

Read the article
python urllib2 thread safety

- by jldupont

Is urllib2 thread safe? I find that using urllib2.urlopen sometimes results in the thread issuing the call to stop functioning. Yes I have used the timeout option to no avail. Is there a thread safe HTTP GET functionality I could use? NOTE: I am not interested in using Twisted to solve this problem. I have used Twisted in the past and I love it but this time I need a simpler solution. NOTE2: I also tried httplib with the same result (blocking).

Read the article
Python's urllib2 don't work on some sites

- by Binny V A

I found that you can't read from some sites using Python's urllib2(or urllib). An example... urllib2.urlopen("http://www.dafont.com/").read() # Returns '' These sites works when you visit the site. I can even scrap them using PHP(didn't try other languages). I have seen other sites with the same issue - but can't remember the URL at the moment. My questions are... What is the cause of this issue? Any workaround for this issue?

Read the article
Does urllib2.urlopen() actually fetch the page?

- by beagleguy

hi all, I was condering when I use urllib2.urlopen() does it just to header reads or does it actually bring back the entire webpage? IE does the HTML page actually get fetch on the urlopen call or the read() call? handle = urllib2.urlopen(url) html = handle.read() The reason I ask is for this workflow... I have a list of urls (some of them with short url services) I only want to read the webpage if I haven't seen that url before I need to call urlopen() and use geturl() to get the final page that link goes to (after the 302 redirects) so I know if I've crawled it yet or not. I don't want to incur the overhead of having to grab the html if I've already parsed that page. thanks!

Read the article
Using paired certificates with urllib2

- by Ned Batchelder

I need to create a secure channel between my server and a remote web service. I'll be using HTTPS with a client certificate. I'll also need to validate the certificate presented by the remote service. How can I use my own client certificate with urllib2? What will I need to do in my code to ensure that the remote certificate is correct?

Read the article
Passing input hidden params through urllib2 POST request

- by ramrajedotcom

I need to make POST request to CAS SSO server login page, and CAS login page has few input hidden params which are dynamically populated through java. I don't know how to read these hidden param values from response and pass in to CAS server. Without passing these hidden params I am not able to login. Does any one how to read input hidden param values from urllib2 response? Thanks in advance!

Read the article
Using urllib2 with SOCKS proxy

- by roddik

Hello. Is it possible to fetch pages with urllib2 through a SOCKS proxy on a one socks server per opener basic? I've seen the solution using setdefaultproxy method, but I need to have different socks in different openers.

Read the article
urllib2 to string

- by Nathan

I'm using urllib2 to open a url. Now I need the html file as a string. How do I do this?

Read the article
urllib2 misbehaving with dynamically loaded content

- by Sheena

Some Code headers = {} headers['user-agent'] = 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0' headers['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' headers['Accept-Language'] = 'en-gb,en;q=0.5' #headers['Accept-Encoding'] = 'gzip, deflate' request = urllib.request.Request(sURL, headers = headers) try: response = urllib.request.urlopen(request) except error.HTTPError as e: print('The server couldn\'t fulfill the request.') print('Error code: {0}'.format(e.code)) except error.URLError as e: print('We failed to reach a server.') print('Reason: {0}'.format(e.reason)) else: f = open('output/{0}.html'.format(sFileName),'w') f.write(response.read().decode('utf-8')) A url http://groupon.cl/descuentos/santiago-centro The situation Here's what I did: enable javascript in browser open url above and keep an eye on the console disable javascript repeat step 2 use urllib2 to grab the webpage and save it to a file enable javascript open the file with browser and observe console repeat 7 with javascript off results In step 2 I saw that a whole lot of the page content was loaded dynamically using ajax. So the HTML that arrived was a sort of skeleton and ajax was used to fill in the gaps. This is fine and not at all surprising Since the page should be seo friendly it should work fine without js. in step 4 nothing happens in the console and the skeleton page loads pre-populated rendering the ajax unnecessary. This is also completely not confusing in step 7 the ajax calls are made but fail. this is also ok since the urls they are using are not local, the calls are thus broken. The page looks like the skeleton. This is also great and expected. in step 8: no ajax calls are made and the skeleton is just a skeleton. I would have thought that this should behave very much like in step 4 question What I want to do is use urllib2 to grab the html from step 4 but I cant figure out how. What am I missing and how could I pull this off? To paraphrase If I was writing a spider I would want to be able to grab plain ol' HTML (as in that which resulted in step 4). I dont want to execute ajax stuff or any javascript at all. I don't want to populate anything dynamically. I just want HTML. The seo friendly site wants me to get what I want because that's what seo is all about. How would one go about getting plain HTML content given the situation I outlined? To do it manually I would turn off js, navigate to the page and copy the html. I want to automate this. stuff I've tried I used wireshark to look at packet headers and the GETs sent off from my pc in steps 2 and 4 have the same headers. Reading about SEO stuff makes me think that this is pretty normal otherwise techniques such as hijax wouldn't be used. Here are the headers my browser sends: Host: groupon.cl User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Here are the headers my script sends: Accept-Encoding: identity Host: groupon.cl Accept-Language: en-gb,en;q=0.5 Connection: close Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 User-Agent: User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0 The differences are: my script has Connection = close instead of keep-alive. I can't see how this would cause a problem my script has Accept-encoding = identity. This might be the cause of the problem. I can't really see why the host would use this field to determine the user-agent though. If I change encoding to match the browser request headers then I have trouble decoding it. I'm working on this now... watch this space, I'll update the question as new info comes up

Read the article
Help converting code using httlib2 to use urllib2

- by ThinkCode

What am I trying to do? Visit a site, retrieve cookie, visit the next page by sending in the cookie info. It all works but httplib2 is giving me one too many problems with socks proxy on one site. http = httplib2.Http() main_url = 'http://mywebsite.com/get.aspx?id='+ id +'&rows=25' response, content = http.request(main_url, 'GET', headers=headers) main_cookie = response['set-cookie'] referer = 'http://google.com' headers = {'Content-type': 'application/x-www-form-urlencoded', 'Cookie': main_cookie, 'User-Agent' : USER_AGENT, 'Referer' : referer} How to do the same exact thing using urllib2 (cookie retrieving, passing to the next page on the same site)? Thank you.

Read the article
Python and urllib2: how to make a GET request with parameters

- by Infinity

I'm building an "API API", it basically a wrapper for a in house REST web service that the web app will be making a lot of requests to. Some of the web service calls need to be GET rather than post, but passing parameters. Is there a "best practice" way to encode a dictionary into a query string? e.g.: ?foo=bar&bla=blah I'm looking at the urllib2 docs, and it looks like it decides by itself wether to use POST or GET based on if you pass params or not, but maybe someone knows how to make it transform the params dictionary into a GET request. Maybe there's a package for something like this out there? It would be great if it supported keep-alive, as the web server will be constantly requesting things from the REST service. Thanks!

Read the article
how to encode a url with urllib or urllib2

- by freeman11

I want a url like example.com/page.html to somthing like example.com/a$xDzf9D84qGBOeXkXNstw%3D%3D106

Read the article
Error 400 in urllib2 when using cookies

- by Hempage

I've had some success using a different method to load cookies, but now I'm wanting to use the cookielib.MozillaCookieJar method to open a cookies.txt file. Here's the snippet of code that does this. cookieJar=MozillaCookieJar() cookieJar.load(argv[2]) After creating an HTTPCookieProcessor opener, and installing it, whenever I use urlopen, I get an HTTP 400 error. However, if I don't use a CookieJar, the urlopen method succeeds (though the response doesn't contain the data I need). I'm not sure whether the cookies.txt file is malformed, or whether there's something else going on.

Read the article
urllib2 in Python 2.6.4: Any way to override windows hosts file?

- by mikez302

I am using the urllib2 module in Python 2.6.4, running in Windows XP, to access a URL. I am making a post request, that does not involve cookies or https or anything too complicated. The domain is redirected in my C:\WINDOWS\system32\drivers\etc\hosts file. However, I would like the request from urllib2 to go to the "real" domain and ignore the entry in the hosts file. Is there any easy and practical way to do this?

Read the article
Problem opening Solr *.jsp pages with urllib2.urlopen.

- by nestling

I'm trying to open a page at http://localhost:8983/solr/admin/stats.jsp but urllib2.urlopen returns a blank string. It works fine for solr/ and solr/admin, but for all the pages above /solr/admin/ I get nothing but a blank string. 76]: t = urllib2.urlopen('http://localhost:8983/solr/admin/stats.jsp') 77]: s = t.read() 78]: s 78]: 79]: type(s) 79]: <type 'str'> 80]: urllib2.urlopen('http://localhost:8983/solr/admin/registry.jsp').read() 80]: In [84]: urllib2.urlopen('http://localhost:8983/solr/admin/schema.jsp').read() Out[84]: I know this isn't a problem with urllib2, but beyond that I am at a loss. I wish solr (or jetty) had an easy to get to log file, so that perhaps it could tell me its side of the story.

Read the article

1 2 3 4 5 6 7 | Next Page >