urllib2 - Page 6 - Developer IT

Getting the dimensions of a remote image file without having to save it to the hard drive

- by kaloyan

I am looking for a way to get the dimensions of a remote image file without having to save it to the hard drive -- I am trying to skimp on the I/O requests. Basically something like PHP's getimagesize() function. I did try something like this: print Image.open(urllib2.urlopen(src)).size but PIL wants a file handle that has a seek() method. Any ideas?

Read the article

can't use appcfg.py update gae

- by user353998

hello, recently i want to upload GAppProxy to GAE. but when i use the appcfg.py to update the files,there comes an error,it was: urllib2.URLError: urlopen error [Errno 8] _ssl.c:480: EOF occurred in violation of protocol i don't know why PS:i live in china,and may be because of the GFW. and when i use the type :appengine.google.com and then input the password,i can't redict to the index page,there is an error too,which says:ssl error

Read the article

What is the best way to open a URL and get up to X bytes in Python?

- by Stavros Korokithakis

I want to have a robot fetch a URL every hour, but if the site's operator is malicious he could have his server send me a 1 GB file. Is there a good way to limit downloading to, say, 100 KB and stop after that limit? I can imagine writing my own connection handler from scratch, but I'd like to use urllib2 if at all possible, just specifying the limit somehow. Thanks!

Read the article

How do I parse youtube xml for a specific entry?

- by sharataka

I am trying to return the duration of the video but am having trouble. #YOUTUBE FEED #download the file: file = urllib2.urlopen('http://gdata.youtube.com/feeds/api/videos/2s0vk2wEMtA') #convert to string: data = file.read() #close file because we dont need it anymore: file.close() #entire feed root = etree.fromstring(data) for entry in root: for item in entry: print item When I print item, I see as the last element: Element '{http://gdata.youtube.com/schemas/2007}duration' at 0x10c4fb7d0 But I don't know how to get the value from this. Any advice?

Read the article

How to use python and beautfulsoup to print timestamp/last updated time (from HTML:) for each row ?

- by cesalo

How to use python and beautfulsoup to print timestamp/last updated time (from HTML:) for each row ? thanks a lot ! A) 1) can i add the print a)date/time and b)last updated time after row ? a) date/time - display the time when execute the python code b) last updated time from HTML: HTML structure: td x 1 including two tables each table have few "tr" and within "tr" have few "td" data inside HTML: <td> <table width="100%" border="4" cellspacing="0" bordercolor="white" align="center"> <tbody> <tr> <td colspan="2" class="verd_black11">Last Updated: 18/08/2014 10:19</td> </tr> <tr> <td colspan="3" class="verd_black11">All data delayed at least 15 minutes</td> </tr> </tbody> </table> <table width="100%" border="4" cellspacing="0" bordercolor="white" align="center"> <tbody id="tbody"> <tr id="tr0" class="tableHdrB1" align="center"> <td align="centre">C Aug-14 - 15000</td> <td align="right"> - </td> <td align="right">5</td> <td align="right">9,904</td> </tr> </tbody> </table> </td> Code: import urllib2 from bs4 import BeautifulSoup contenturl = "HTML:" soup = BeautifulSoup(urllib2.urlopen(contenturl).read()) table = soup.find('tbody', attrs={'id': 'tbody'}) rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') for td in cols: t = td.find(text=True) if t: text = t + ';' print text, print Output from above code C Aug-14 - 15000 ; - ; 5 ; 9,904 Expected output: C Aug-14 - 15000 ; - ; 5 ; 9,904 ; 18/08/2014 ; 13:48:00 ; 18/08/2014 ; 10:19 (execute python code) (last updated time)

Read the article

How do I set up my own proxy server?

- by NJTechGuy

This website (abc.com) slowed access from our original IP address. How do I implement my own proxy server to hide my IP while browsing abc.com? Do I need special hardware/software combo to achieve this? If I can generate about 5 proxies and alternate amongst those 5 while browsing abc.com would be awesome. Please suggest. Thanks guys! p.s : I want to know if I can generate proxy IPs of the type 123.34.21.140 prot 80 on my own? I want to use those IP/port combos in my Python scripts (urllib2/set_proxy).

Read the article

How to make a post request with REQUESTS package for Python?

- by jorrebor

I am trying to use the toggl api. I use Requests instead of Urllib2 for doing my GETs en POSTS. But i am stuck. payload = { "project":{ "name":"Another Project", "billable":False, "workspace":{ "Name":"jorrebor's workspace", "id":213272 }, "automatically_calculate_estimated_workhours":False } } url = "https://www.toggl.com/api/v6/projects.json" r = requests.post(url, data=json.dumps(payload), auth=HTTPBasicAuth('[email protected]', 'mypassword')) Authentication seems to be fine, but the payload format probably isn't. a curl command with the same parameters: curl -v -u heremytoken:api_token -H "Content-type: application/json" -d "{\"project\":{\"billable\":true,\"workspace\":{\"id\":213272},\"name\":\"Another project\",\"automatically_calculate_estimated_workhours\":false}}" -X POST https://www.toggl.com/api/v6/projects.json does work fine. What wrong with my payload? The response is get is: ["Name can't be blank","Workspace can't be blank"] which leads me to conclude that the authentication works but toggl cannot read my json object.

Read the article

Encoding in python with lxml - complex solution

- by Vojtech R.

Hi, I need to download and parse webpage with lxml and build UTF-8 xml output. I thing schema in pseudocode is more illustrative: from lxml import etree webfile = urllib2.urlopen(url) root = etree.parse(webfile.read(), parser=etree.HTMLParser(recover=True)) txt = my_process_text(etree.tostring(root.xpath('/html/body'), encoding=utf8)) output = etree.Element("out") output.text = txt outputfile.write(etree.tostring(output, encoding=utf8)) So webfile can be in any encoding (lxml should handle this). Outputfile have to be in utf-8. I'm not sure where to use encoding/coding. Is this schema ok? (I cant find good tutorial about lxml and encoding, but I can find many problems with this...) I need robust approved solution so I ask you seniors. Many thanks

Read the article

Cookies with urllib

- by CMC

This will probably seem like a really simple question, and I am quite confused as to why this is so difficult for me. I would like to write a function that takes three inputs: [url, data, cookies] that will use urllib (not urllib2) to get the contents of the requested url. I figured it'd be simple, so I wrote the following: def fetch(url, data = None, cookies = None): if isinstance(data, dict): data = urllib.urlencode(data) if isinstance(cookies, dict): # TODO: find a better way to do this cookies = "; ".join([str(key) + "=" + str(cookies[key]) for key in cookies]) opener = urllib.FancyURLopener() opener.addheader("Cookie", cookies) obj = opener.open(url, data) result = obj.read() obj.close() return result This doesn't work, as far as I can tell (can anyone confirm that?) and I'm stumped.

Read the article

My python auto-login script is broken.

- by user310392

A long time ago, I wrote a little python script to automatically log me on to the wireless network at my office. Here is the code: #!/opt/local/bin/python from urllib2 import urlopen from ClientForm import ParseResponse try: if "Logged on as" in urlopen("https://MYWIRELESS.com/logon").read(): print "Already logged on." else: forms = ParseResponse(urlopen("https://MYWIRELESS.com/logon"), backwards_compat=False) form = forms[0] form["username"], form["password"] = "ME", "MYPASSWD" urlopen(form.click()) print "Logged on. (probably :-)"; except IOError, e: print "Couldn't connect to wireless login page:\n", e I changed computers recently, and it stopped working. Now, I get the error: File "login.txt", line 4, in <module> from ClientForm import ParseResponse ImportError: No module named ClientForm which makes it look like I don't have some package (ClientForm) installed, so I installed it (sudo port install py-clientform), but I still get the same error. Does anyone have an idea what I'm doing wrong?

Read the article

Why is this the output of this python program?

- by Andrew Moffat

Someone from #python suggested that it's searching for module "herpaderp" and finding all the ones listed as its searching. If this is the case, why doesn't it list every module on my system before raising ImportError? Can someone shed some light on what's happening here? import sys class TempLoader(object): def __init__(self, path_entry): if path_entry == 'test': return raise ImportError def find_module(self, fullname, path=None): print fullname, path return None sys.path.insert(0, 'test') sys.path_hooks.append(TempLoader) import herpaderp output: 16:00:55 $> python wtf.py herpaderp None apport None subprocess None traceback None pickle None struct None re None sre_compile None sre_parse None sre_constants None org None tempfile None random None __future__ None urllib None string None socket None _ssl None urlparse None collections None keyword None ssl None textwrap None base64 None fnmatch None glob None atexit None xml None _xmlplus None copy None org None pyexpat None problem_report None gzip None email None quopri None uu None unittest None ConfigParser None shutil None apt None apt_pkg None gettext None locale None functools None httplib None mimetools None rfc822 None urllib2 None hashlib None _hashlib None bisect None Traceback (most recent call last): File "wtf.py", line 14, in <module> import herpaderp ImportError: No module named herpaderp

Read the article

Paver 0.8.1 compatibility with python 2.6

- by Bertrand

Hi, Does anyone manage to bootstrap its development area using paver with python 2.6 ? I have install python 2.6, install paver with easy_install-2.6, everything looks fine. But when I try to launch the bootstrap method it raises an urllib2.HTTPError (: HTTP Error 404: Not Found) while trying to download http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c8-py2.6.egg. I have tryed to add the correct setuptools EGG file (which is 0.6c9) in the support-files directory, bootstrap.py find the EGG file, but doesn't seem to use it because it still try to download the 0.6c8 version which is no more available. Any ideas how to solve this issue ? Thanks in advance Bertrand

Read the article

ImportError and Django driving me crazy

- by John Peebles

OK, I have the following directory structure (it's a django project): - project -- app and within the app folder, there is a scraper.py file which needs to reference a class defined within models.py I'm trying to do the following: import urllib2 import os import sys import time import datetime import re import BeautifulSoup sys.path.append('/home/userspace/Development/') os.environ['DJANGO_SETTINGS_MODULE'] = 'project.settings' from project.app.models import ClassName and this code just isn't working. I get an error of: Traceback (most recent call last): File "scraper.py", line 14, in from project.app.models import ClassName ImportError: No module named project.app.models This code above used to work, but broke somewhere along the line and I'm extremely confused as to why I'm having problems. On SnowLeopard using python2.5.

Read the article

Python regex on list

- by Peter Nielsen

Hi there I am trying to build a parser and save the results as an xml file but i have problems.. For instance i get a TypeError: expected string or buffer when i try to run the code.. Would you experts please have a look at my code ? import urllib2, re from xml.dom.minidom import Document from BeautifulSoup import BeautifulSoup as bs osc = open('OSCTEST.html','r') oscread = osc.read() soup=bs(oscread) doc = Document() root = doc.createElement('root') doc.appendChild(root) countries = doc.createElement('countries') root.appendChild(countries) findtags1 = re.compile ('<h1 class="title metadata_title content_perceived_text(.*?)</h1>', re.DOTALL | re.IGNORECASE).findall(soup) findtags2 = re.compile ('<span class="content_text">(.*?)</span>', re.DOTALL | re.IGNORECASE).findall(soup) for header in findtags1: title_elem = doc.createElement('title') countries.appendChild(title_elem) header_elem = doc.createTextNode(header) title_elem.appendChild(header_elem) for item in findtags2: art_elem = doc.createElement('artikel') countries.appendChild(art_elem) s = item.replace('<P>','') t = s.replace('</P>','') text_elem = doc.createTextNode(t) art_elem.appendChild(text_elem) print doc.toprettyxml()

Read the article

login through twitter not working in osqa

- by Pankaj Khurana

Hi, I have installed osqa on a site hosted on hostgator. The login functionality is working for google,yahoo,facebook. But when i click on twitter icon its generating an exception. I have already added the twitter consumer key and the twitter consumer secret through admin interface. The exception i am getting is: HTTPError at /account/twitter/signin/ HTTP Error 401: Unauthorized Request Method: GET Request URL: http://mydomain/account/twitter/signin/?validate_email=yes Exception Type: HTTPError Exception Value: HTTP Error 401: Unauthorized Exception Location: /usr/lib/python2.4/urllib2.py in http_error_default, line 480 Python Executable: /usr/bin/python Python Version: 2.4.3 I am unable to trace out the reason for the same. Please help me on this. Thanks

Read the article

how to send data to server using python

- by Apache

hi experts, how data can be send to the server, for example i retrieve MAC address, so i want send to the server ( i.e 211.21.24.43:8080/data?mac=00-0C-F1-56-98-AD i found snippet from internet as below from urllib2 import Request, urlopen from binascii import b2a_base64 def b64open(url, postdata): req = Request(url, b2a_base64(postdata), headers={'Content-Transfer-Encoding': 'base64'}) return urlopen(req) conn = b64open("http://211.21.24.43:8080/data","mac=00-0C-F1-56-98-AD") but when run, File "send2.py", line 8 SyntaxError: Non-ASCII character '\xc3' in file send2.py on line 8, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details can anyone help me how send data to the server thanks in advance

Read the article

urlopen error [errno 111] connection refused

- by Ui-Gyun Jeong

I am doing python exercise with a book 'headfirst python' and making android app by using python and sl4a my code is import android import json import time from urllib import urlencode from urllib2 import urlopen hello_msg = "Welcome to Coach Kelly's Timing App" list_title = 'Here is your list of athletes:' quit_msg = "Quitting Coach Kelly's App." web_server = 'http://127.0.0.1:8080' get_names_cgi = '/cgi-bin/generate_name.py' def send_to_server(url, post_data=None): if post_data: page = urlopen(url, urlencode(post_data)) else: page = urlopen(url) return(page.read().decode("utf8")) app = android.Android() def status_update(msg, how_long=2): app.makeToast(msg) time.sleep(how_long) status_update(hello_msg) athlete_names = sorted(json.loads(send_to_server(web_server + get_names_cgi))) app.dialogCreateAlert(list_title) app.dialogSetSingleChoiceItems(athlete_names) app.dialogSetPositiveButtonText('Select') app.dialogSetNegativeButtonText('Quit') app.dialogShow() resp = app.dialogGetResponse().result status_update(quit_msg) this is my code and the result is what is the problem??? I can not figure out what the problem is...

Read the article

Python: find <title>

- by Peter

I have this: response = urllib2.urlopen(url) html = response.read() begin = html.find('<title>') end = html.find('</title>',begin) title = html[begin+len('<title>'):end].strip() if the url = http://www.google.com then the title have no problem as "Google", but if the url = "http://www.britishcouncil.org/learning-english-gateway" then the title become "<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML> <HEAD> <base href="http://www.britishcouncil.org/" /> <META http-equiv="Content-Type" Content="text/html;charset=utf-8"> <meta name="WT.sp" content="Learning;Home Page Smart View" /> <meta name="WT.cg_n" content="Learn English Gateway" /> <META NAME="DCS.dcsuri" CONTENT="/learning-english-gateway.htm">..." What is actually happening, why I couldn't return the "title"?

Read the article

App Engine remote_api with OpenID

- by hawkettc

Hi, I've recently tried to switch my app engine app to using openID, but I'm having an issue authenticating with remote_api. The old authentication mechanism for remote_api doesn't seem to work (which makes sense) - I'm getting a 'urllib2.HTTPError: HTTP Error 302: Found', which I assume is appengine redirecting me to the openid login page I've set up. I guess I'm missing something fairly obvious. Currently my remote_api script has the following in it - remote_api_stub.ConfigureRemoteDatastore(app_id=app_id, path='/remote_api', auth_func=auth_func, servername=host, secure=secure) where auth_func is def auth_func(): return raw_input('Username:'), getpass.getpass('Password:') Any ideas what I need to supply to remote_api? I guess similar issues would be encountered with bulkloader too. Cheers, Colin

Read the article

Replacing backslashes in Python strings

- by user323659

I have some code to encrypt some strings in Python. Encrypted text is used as a parameter in some urls, but after encrypting, there comes backslashes in string and I cannot use single backslash in urllib2.urlopen. I cannot replace single backslash with double. For example: print cipherText '\t3-@\xab7+\xc7\x93H\xdc\xd1\x13G\xe1\xfb' print cipherText.replace('\\','\\\\') '\t3-@\xab7+\xc7\x93H\xdc\xd1\x13G\xe1\xfb' Also putting r in front of \ in replace statement did not worked. All I want to do is calling that kind of url: http://awebsite.me/main?param="\t3-@\xab7+\xc7\x93H\xdc\xd1\x13G\xe1\xfb" And also this url can be successfully called: http://awebsite.me/main?param="\\t3-@\\xab7+\\xc7\\x93H\\xdc\\xd1\\x13G\\xe1\\xfb" Any idea will be appreciated.

Read the article

Generating PDF results in single page only?

- by A T

Generating a PDF from an email (Zurb Ink templated); but am always presented with a single page PDF. Runnable test-case: from weasyprint import HTML, CSS from urllib2 import urlopen if __name__ == '__main__': html = urlopen('http://zurb.com/ink/downloads/templates/basic.html').read() html = html.replace('<p class=\"lead\">', '{0}<p class=\"lead\">'.format( '<p class=\"lead\">{0}</p>'.format("foobar " * 50) * 50)) HTML(string=html).write_pdf('foo.pdf', stylesheets=[ CSS(string='@page { size: A4; margin: 2cm };' '* { float: none !important; };' '@media print { nav { display: none; } }') ]) How do I get a multi-page PDF?

Read the article

[Python] Help me : how i can deal with web page !

- by Str1k3r

hello every one... am looking for modules or functions let's me joins in id web !!!! i mean like i told python go to hotmail.com then go to signup ! how i can do that i mean how i can tell python go to hotmail.com then find some thing called signup in source page then i tell him join to him ....etc i hope you understand my idea ! ** am thinking on urllib2 .. maybe it's can do that? am just new in python

Read the article

(Python) Extracting Text from Source Code?

- by zhuyxn

Currently have a large webpage whose source code is ~200,000 lines of almost all (if not all) HTML. More specifically, it is a webpage whose content is a few thousand blocks of paragraphs separated by line breaks (though a line break does not specifically mean there is a separation in content) My main objective is to extract text from the source code as if I were copying/pasting the webpage into a text editor. There is another parsing function I would like to use, which originally took in copied/pasted text rather than the source code. To do this, I'm currently using urllib2, and calling .get_text() in Beautiful Soup. The problem is, Beautiful Soup is leaving tremendous amounts of white space in my code, and it is difficult to pass the result into the second "text" parser. I have done quite a bit of research on parsing HTMLs, but I'm frankly not sure how to solve this problem easily. Furthermore, I'm a bit confused on how to use imports like lxml to extract text as if I were to simply copy and paste?

Read the article

PYTHON: ntlm authentication

- by Svetlana

Hello!! I'm trying to implement NTLM authentication on IIS (Windows Server 2003) from Windows 7 with python. LAN Manager Authentication Level: Send NTLM response only. Client machine and server are in the same domain. Domain controller (AD) is on another server (also running Windows Server 2003). I recieve 401.1 - Unauthorized: Access is denied due to invalid credentials. Could you please help me find out what is wrong with this code and/or show me the other possible directions to solve this problem (using NTLM or Kerberos)? [python] import sys, httplib, base64, string import urllib2 import win32api import sspi import pywintypes import socket class WindoewNtlmMessageGenerator: def __init__(self,user=None): import win32api,sspi if not user: user = win32api.GetUserName() self.sspi_client = sspi.ClientAuth("NTLM",user) def create_auth_req(self): import pywintypes output_buffer = None error_msg = None try: error_msg, output_buffer = self.sspi_client.authorize(None) except pywintypes.error: return None auth_req = output_buffer[0].Buffer auth_req = base64.encodestring(auth_req) auth_req = string.replace(auth_req,'\012','') return auth_req def create_challenge_response(self,challenge): import pywintypes output_buffer = None input_buffer = challenge error_msg = None try: error_msg, output_buffer = self.sspi_client.authorize(input_buffer) except pywintypes.error: return None response_msg = output_buffer[0].Buffer response_msg = base64.encodestring(response_msg) response_msg = string.replace(response_msg,'\012','') return response_msg fname='request.xml' request = file(fname).read() ip_host = '10.0.3.112' ntlm_gen = WindoewNtlmMessageGenerator() auth_req_msg = ntlm_gen.create_auth_req() auth_req_msg_dec = base64.decodestring(auth_req_msg) auth_req_msg = string.replace(auth_req_msg,'\012','') webservice = httplib.HTTPConnection(ip_host) webservice.putrequest("POST", "/idc/idcplg") webservice.putheader("Content-length", "%d" % len(request)) webservice.putheader('Authorization', 'NTLM'+' '+auth_req_msg) webservice.endheaders() resp = webservice.getresponse() resp.read() challenge = resp.msg.get('WWW-Authenticate') challenge_dec = base64.decodestring(challenge.split()[1]) msg3 = ntlm_gen.create_challenge_response(challenge_dec) webservice = httplib.HTTP(ip_host) webservice.putrequest("POST", "/idc/idcplg?IdcService=LOGIN&Auth=Intranet") webservice.putheader("Host", SHOD) webservice.putheader("Content-length", "%d" % len(request)) webservice.putheader('Authorization', 'NTLM'+' '+msg3) webservice.putheader("Content-type", "text/xml; charset=\"UTF-8\"") webservice.putheader("SOAPAction", "\"\"") webservice.endheaders() webservice.send(request) statuscode, statusmessage, header = webservice.getreply() res = webservice.getfile().read() res_file = file('result.txt','wb') res_file.write(res) res_file.close() [/python] sspi.py is available here: http://www.koders.com/python/fidF3B0061A07CD13BA35FF263E3E45252CFABFAA3B.aspx?s=timer Thanks!

Read the article

Trying to grab just absolute links from a webpage using BeautifulSoup

- by Kevin

I am reading the contents of a webpage using BeautifulSoup. What I want is to just grab the <a href> that start with http://. I know in beautifulsoup you can search by the attributes. I guess I am just having a syntax issue. I would imagine it would go something like. page = urllib2.urlopen("http://www.linkpages.com") soup = BeautifulSoup(page) for link in soup.findAll('a'): if link['href'].startswith('http://'): print links But that returns: Traceback (most recent call last): File "<stdin>", line 2, in <module> File "C:\Python26\lib\BeautifulSoup.py", line 598, in __getitem__ return self._getAttrMap()[key] KeyError: 'href' Any ideas? Thanks in advance. EDIT This isn't for any site in particular. The script gets the url from the user. So internal link targets would be an issue, that's also why I only want the <'a'> from the pages. If I turn it towards www.reddit.com, it parses the beginning links and it gets to this: <a href="http://www.reddit.com/top/">top</a> <a href="http://www.reddit.com/saved/">saved</a> Traceback (most recent call last): File "<stdin>", line 2, in <module> File "C:\Python26\lib\BeautifulSoup.py", line 598, in __getitem__ return self._getAttrMap()[key] KeyError: 'href'

Search Results

Search found 159 results on 7 pages for 'urllib2'.

Page 6/7 | < Previous Page | 2 3 4 5 6 7 | Next Page >

- by kaloyan

- by user353998

- by Stavros Korokithakis

- by sharataka

- by cesalo

- by NJTechGuy

- by jorrebor

- by Vojtech R.

- by CMC

- by user310392

- by Andrew Moffat

- by Bertrand

- by John Peebles

- by Peter Nielsen

- by Pankaj Khurana

- by Apache

- by Ui-Gyun Jeong

- by Peter

- by hawkettc

- by user323659

- by A T

- by Str1k3r

- by zhuyxn

- by Svetlana

- by Kevin

< Previous Page | 2 3 4 5 6 7 | Next Page >