Search Results

Search found 159 results on 7 pages for 'urllib2'.

Page 5/7 | < Previous Page | 1 2 3 4 5 6 7 | Next Page >

Python: using a regular expression to match one line of HTML

- by skylarking

This simple Python method I put together just checks to see if Tomcat is running on one of our servers. import urllib2 import re import sys def tomcat_check(): tomcat_status = urllib2.urlopen('http://10.1.1.20:7880') results = tomcat_status.read() pattern = re.compile('<body>Tomcat is running...</body>',re.M|re.DOTALL) q = pattern.search(results) if q == []: notify_us() else: print ("Tomcat appears to be running") sys.exit() If this line is not found : <body>Tomcat is running...</body> It calls : notify_us() Which uses SMTP to send an email message to myself and another admin that Tomcat is no longer runnning on the server... I have not used the re module in Python before...so I am assuming there is a better way to do this... I am also open to a more graceful solution with Beautiful Soup ... but haven't used that either.. Just trying to keep this as simple as possible...

Read the article
How do I modify this download function in Python?

- by TIMEX

Right now, it's iffy. Gzip, images, sometimes it doesn't work. How do I modify this download function so that it can work with anything? (Regardless of gzip or any header?) How do I automatically "Detect" if it's gzip? I don't want to always pass True/False, like I do right now. def download(source_url, g = False, correct_url = True): try: socket.setdefaulttimeout(10) agents = ['Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)','Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)','Microsoft Internet Explorer/4.0b1 (Windows 95)','Opera/8.00 (Windows NT 5.1; U; en)'] ree = urllib2.Request(source_url) ree.add_header('User-Agent',random.choice(agents)) ree.add_header('Accept-encoding', 'gzip') opener = urllib2.build_opener() h = opener.open(ree).read() if g: compressedstream = StringIO(h) gzipper = gzip.GzipFile(fileobj=compressedstream) data = gzipper.read() return data else: return h except Exception, e: return ""

Read the article
Returning more than one result

- by Hairr

I'm using the following code: def recentchanges(bot=False,rclimit=20): """ @description: Gets the last 20 pages edited on the recent changes and who the user who edited it """ recent_changes_data = { 'action':'query', 'list':'recentchanges', 'rcprop':'user|title', 'rclimit':rclimit, 'format':'json' } if bot is False: recent_changes_data['rcshow'] = '!bot' else: pass data = urllib.urlencode(recent_changes_data) response = opener.open('http://runescape.wikia.com/api.php',data) content = json.load(response) pages = tuple(content['query']['recentchanges']) for title in pages: return title['title'] When I do recentchanges() I only get one result. If I print it though, all the pages are printed. Am I just misunderstanding or is this something relating to python? Also, opener is: cj = CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

Read the article
Backup Google Calendar programmatically: https://www.google.com/calendar/exporticalzip

- by Michael

I'm struggling with writing a python script that automatically grabs the zip fail containing all my google calendars and stores it (as a backup) on my harddisk. I'm using ClientLogin to get an authentication token (and successfully can obtain the token). Unfortunately, i'm unable to retrieve the file at https://www.google.com/calendar/exporticalzip It always asks me for the login credentials again by returning a login page as html (instead of the zip). Here's the critical code: post_data = post_data = urllib.urlencode({ 'auth': token, 'continue': zip_url}) request = urllib2.Request('https://www.google.com/calendar', post_data, header) try: f = urllib2.urlopen(request) result = f.read() except: print "Error" Anyone any ideas or done that before? Or an alternative idea how to backup all my calendars (automatically!)

Read the article
Cross-site request forgery protections: Where do I put all these lines?

- by brilliant

Hello, I was looking for a python code that would be able to log in from "Google App Engine" to some of my accounts on some websites (like yahoo or eBay) and was given this code: import urllib, urllib2, cookielib url = "https://login.yahoo.com/config/login?" form_data = {'login' : 'my-login-here', 'passwd' : 'my-password-here'} jar = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar)) form_data = urllib.urlencode(form_data) # data returned from this pages contains redirection resp = opener.open(url, form_data) # yahoo redirects to http://my.yahoo.com, so lets go there instead resp = opener.open('http://mail.yahoo.com') print resp.read() Unfortunately, this code didn't work, so I asked another question here and one supporter among other things said this: "You send MD5 hash and not plain password. Also you'd have to play along with all kinds of CSRF protections etc. that they're implementing. Look: <input type="hidden" name=".tries" value="1"> <input type="hidden" name=".src" value="ym"> <input type="hidden" name=".md5" value=""> <input type="hidden" name=".hash" value=""> <input type="hidden" name=".js" value=""> <input type="hidden" name=".last" value=""> <input type="hidden" name="promo" value=""> <input type="hidden" name=".intl" value="us"> <input type="hidden" name=".bypass" value=""> <input type="hidden" name=".partner" value=""> <input type="hidden" name=".u" value="bd5tdpd5rf2pg"> <input type="hidden" name=".v" value="0"> <input type="hidden" name=".challenge" value="5qUiIPGVFzRZ2BHhvtdGXoehfiOj"> <input type="hidden" name=".yplus" value=""> <input type="hidden" name=".emailCode" value=""> <input type="hidden" name="pkg" value=""> <input type="hidden" name="stepid" value=""> <input type="hidden" name=".ev" value=""> <input type="hidden" name="hasMsgr" value="0"> <input type="hidden" name=".chkP" value="Y"> <input type="hidden" name=".done" value="http://mail.yahoo.com"> <input type="hidden" name=".pd" value="ym_ver=0&c=&ivt=&sg="> I am not quite sure where he got all these lines from and where in my code I am supposed to add them. Do You have any idea? I know I was supposed to ask him this question first, and I did, but he never returned, so I decided to ask a separate question here.

Read the article
Python BeautifulSoup Print Info in CSV

- by Codin

I can print the information I am pulling from a site with no problem. But when I try to place the street names in one column and the zipcodes into another column into a CSV file that is when I run into problems. All I get in the CSV is the two column names and every thing in its own column across the page. Here is my code. Also I am using Python 2.7.5 and Beautiful soup 4 from bs4 import BeautifulSoup import csv import urllib2 url="http://www.conakat.com/states/ohio/cities/defiance/road_maps/" page=urllib2.urlopen(url) soup = BeautifulSoup(page.read()) f = csv.writer(open("Defiance Steets1.csv", "w")) f.writerow(["Name", "ZipCodes"]) # Write column headers as the first line links = soup.find_all(['i','a']) for link in links: names = link.contents[0] print unicode(names) f.writerow(names)

Read the article
How do I print the Images?

- by user1477539

I want to print the images of the 30 nba teams drafting in the first round. However when I tell it to print it prints out the link instead of the image. How do I get it to print out the image instead of giving me the image link. Here's my code: import urllib2 from BeautifulSoup import BeautifulSoup # or if your're using BeautifulSoup4: # from bs4 import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read()) rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:] for row in rows: fields = row.findAll("td") if len(fields) >= 3: anchor = row.findAll("td")[1].find("a") if anchor: print anchor

Read the article
Correct way to protect a private API key when versioning a python application on a public git repo

- by systempuntoout

I would like to open-source a python project on Github but it contains an API key that should not be distributed. I guess there's something better than removing the key each time a "push" is committed to the repo. Imagine a simplified foomodule.py : import urllib2 API_KEY = 'XXXXXXXXX' urllib2.urlopen("http://example.com/foo?id=123%s" % API_KEY ).read() What i'm thinking is: Move the API_KEY in a second key.py module importing it on foomodule.py; i would then add key.py on .gitignore file. Same as 1 but using ConfigParser Do you know a good programmatic way to handle this scenario?

Read the article
Red Hat Yum not working out of the box?

- by Tucker

I have a server runnning Red Hat Enterprise Linux v5.6 in the cloud. My project constraints do not allow me to use another OS. When I created the cloud server, I was able to SSH into it and access the shell. I next ran the command: sudo yum update But the command failed. About a month ago I created another server with the same machine image and didn't have that error. Why is it failing now? The following is the terminal output sudo yum update Loaded plugins: security Repository rhel-server is listed more than once in the configuration Traceback (most recent call last): File "/usr/bin/yum", line 29, in ? yummain.user_main(sys.argv[1:], exit_code=True) File "/usr/share/yum-cli/yummain.py", line 309, in user_main errcode = main(args) File "/usr/share/yum-cli/yummain.py", line 178, in main result, resultmsgs = base.doCommands() File "/usr/share/yum-cli/cli.py", line 345, in doCommands self._getTs(needTsRemove) File "/usr/lib/python2.4/site-packages/yum/depsolve.py", line 101, in _getTs self._getTsInfo(remove_only) File "/usr/lib/python2.4/site-packages/yum/depsolve.py", line 112, in _getTsInfo pkgSack = self.pkgSack File "/usr/lib/python2.4/site-packages/yum/__init__.py", line 662, in <lambda> pkgSack = property(fget=lambda self: self._getSacks(), File "/usr/lib/python2.4/site-packages/yum/__init__.py", line 502, in _getSacks self.repos.populateSack(which=repos) File "/usr/lib/python2.4/site-packages/yum/repos.py", line 260, in populateSack sack.populate(repo, mdtype, callback, cacheonly) File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 168, in populate if self._check_db_version(repo, mydbtype): File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 226, in _check_db_version return repo._check_db_version(mdtype) File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 1233, in _check_db_version repoXML = self.repoXML File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 1406, in <lambda> repoXML = property(fget=lambda self: self._getRepoXML(), File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 1398, in _getRepoXML self._loadRepoXML(text=self) File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 1388, in _loadRepoXML return self._groupLoadRepoXML(text, ["primary"]) File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 1372, in _groupLoadRepoXML if self._commonLoadRepoXML(text): File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 1208, in _commonLoadRepoXML result = self._getFileRepoXML(local, text) File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 989, in _getFileRepoXML cache=self.http_caching == 'all') File "/usr/lib/python2.4/site-packages/yum/yumRepo.py", line 826, in _getFile http_headers=headers, File "/usr/lib/python2.4/site-packages/urlgrabber/mirror.py", line 412, in urlgrab return self._mirror_try(func, url, kw) File "/usr/lib/python2.4/site-packages/urlgrabber/mirror.py", line 398, in _mirror_try return func_ref( *(fullurl,), **kwargs ) File "/usr/lib/python2.4/site-packages/urlgrabber/grabber.py", line 936, in urlgrab return self._retry(opts, retryfunc, url, filename) File "/usr/lib/python2.4/site-packages/urlgrabber/grabber.py", line 854, in _retry r = apply(func, (opts,) + args, {}) File "/usr/lib/python2.4/site-packages/urlgrabber/grabber.py", line 922, in retryfunc fo = URLGrabberFileObject(url, filename, opts) File "/usr/lib/python2.4/site-packages/urlgrabber/grabber.py", line 1010, in __init__ self._do_open() File "/usr/lib/python2.4/site-packages/urlgrabber/grabber.py", line 1093, in _do_open fo, hdr = self._make_request(req, opener) File "/usr/lib/python2.4/site-packages/urlgrabber/grabber.py", line 1202, in _make_request fo = opener.open(req) File "/usr/lib64/python2.4/urllib2.py", line 358, in open response = self._open(req, data) File "/usr/lib64/python2.4/urllib2.py", line 376, in _open '_open', req) File "/usr/lib64/python2.4/urllib2.py", line 337, in _call_chain result = func(*args) File "/usr/lib64/python2.4/site-packages/M2Crypto/m2urllib2.py", line 82, in https_open h.request(req.get_method(), req.get_selector(), req.data, headers) File "/usr/lib64/python2.4/httplib.py", line 810, in request self._send_request(method, url, body, headers) File "/usr/lib64/python2.4/httplib.py", line 833, in _send_request self.endheaders() File "/usr/lib64/python2.4/httplib.py", line 804, in endheaders self._send_output() File "/usr/lib64/python2.4/httplib.py", line 685, in _send_output self.send(msg) File "/usr/lib64/python2.4/httplib.py", line 652, in send self.connect() File "/usr/lib64/python2.4/site-packages/M2Crypto/httpslib.py", line 47, in connect self.sock.connect((self.host, self.port)) File "/usr/lib64/python2.4/site-packages/M2Crypto/SSL/Connection.py", line 174, in connect ret = self.connect_ssl() File "/usr/lib64/python2.4/site-packages/M2Crypto/SSL/Connection.py", line 167, in connect_ssl return m2.ssl_connect(self.ssl, self._timeout) M2Crypto.SSL.SSLError: certificate verify failed

Read the article
How to save POST&GET headers of a web page with "Wireshark"?

- by brilliant

Hello everybody, I've been trying to find a python code that would log in to my mail box on yahoo.com from "Google App Engine". I was given this code: import urllib, urllib2, cookielib url = "https://login.yahoo.com/config/login?" form_data = {'login' : 'my-login-here', 'passwd' : 'my-password-here'} jar = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar)) form_data = urllib.urlencode(form_data) # data returned from this pages contains redirection resp = opener.open(url, form_data) # yahoo redirects to http://my.yahoo.com, so lets go there instead resp = opener.open('http://mail.yahoo.com') print resp.read() The author of this script looked into HTML script of yahoo log-in form and came up with this script. That log-in form contains two fields, one for users' Yahoo! ID and another one is for users' password. However, when I tried this code out (substituting mu real Yahoo login for 'my-login-here' and my real password for 'my-password-here'), it just return the log-in form back to me, which means that something didn't work right. Another supporter suggested that I should send an MD5 hash of my password, rather than a plain password. He also noted that in that log-in form there are a lot other hidden fields besides login and password fields (he called them "CSRF protections") that I would also have to deal with: <input type="hidden" name=".tries" value="1"> <input type="hidden" name=".src" value="ym"> <input type="hidden" name=".md5" value=""> <input type="hidden" name=".hash" value=""> <input type="hidden" name=".js" value=""> <input type="hidden" name=".last" value=""> <input type="hidden" name="promo" value=""> <input type="hidden" name=".intl" value="us"> <input type="hidden" name=".bypass" value=""> <input type="hidden" name=".partner" value=""> <input type="hidden" name=".u" value="bd5tdpd5rf2pg"> <input type="hidden" name=".v" value="0"> <input type="hidden" name=".challenge" value="5qUiIPGVFzRZ2BHhvtdGXoehfiOj"> <input type="hidden" name=".yplus" value=""> <input type="hidden" name=".emailCode" value=""> <input type="hidden" name="pkg" value=""> <input type="hidden" name="stepid" value=""> <input type="hidden" name=".ev" value=""> <input type="hidden" name="hasMsgr" value="0"> <input type="hidden" name=".chkP" value="Y"> <input type="hidden" name=".done" value="http://mail.yahoo.com"> He said that I should do the following: Simulate normal login and save login page that I get; Save POST&GET headers with "Wireshark"; Compare login page with those headers and see what fields I need to include with my request; I really don't know how to carry out the first two of these three steps. I have just downloaded "Wireshark" and have tried capturing some packets there. However, I don't know how to "simulate normal login and save the login page". Also, I don't how to save POST$GET headers with "Wireshark". Can anyone, please, guide me through these two steps in "Wireshark"? Or at least tell me what I should start with. Thank You.

Read the article
Python and mechanize login script

- by Perun

Hi fellow programmers! I am trying to write a script to login into my universities "food balance" page using python and the mechanize module... This is the page I am trying to log into: http://www.wcu.edu/11407.asp The website has the following form to login: <FORM method=post action=https://itapp.wcu.edu/BanAuthRedirector/Default.aspx><INPUT value=https://cf.wcu.edu/busafrs/catcard/idsearch.cfm type=hidden name=wcuirs_uri> <P><B>WCU ID Number<BR></B><INPUT maxLength=12 size=12 type=password name=id> </P> <P><B>PIN<BR></B><INPUT maxLength=20 type=password name=PIN> </P> <P></P> <P><INPUT value="Request Access" type=submit name=submit> </P></FORM> From this we know that I need to fill in the following fields: 1. name=id 2. name=PIN With the action: action=https://itapp.wcu.edu/BanAuthRedirector/Default.aspx This is the script I have written thus far: #!/usr/bin/python2 -W ignore import mechanize, cookielib from time import sleep url = 'http://www.wcu.edu/11407.asp' myId = '11111111111' myPin = '22222222222' # Browser #br = mechanize.Browser() #br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True)) br = mechanize.Browser(factory=mechanize.RobustFactory()) # Use this because of bad html tags in the html... # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) # Follows refresh 0 but not hangs on refresh > 0 br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) # User-Agent (fake agent to google-chrome linux x86_64) br.addheaders = [('User-agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11'), ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'), ('Accept-Encoding', 'gzip,deflate,sdch'), ('Accept-Language', 'en-US,en;q=0.8'), ('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.3')] # The site we will navigate into br.open(url) # Go though all the forms (for debugging only) for f in br.forms(): print f # Select the first (index two) form br.select_form(nr=2) # User credentials br.form['id'] = myId br.form['PIN'] = myPin br.form.action = 'https://itapp.wcu.edu/BanAuthRedirector/Default.aspx' # Login br.submit() # Wait 10 seconds sleep(10) # Save to a file f = file('mycatpage.html', 'w') f.write(br.response().read()) f.close() Now the problem... For some odd reason the page I get back (in mycatpage.html) is the login page and not the expected page that displays my "cat cash balance" and "number of block meals" left... Does anyone have any idea why? Keep in mind that everything is correct with the header files and while the id and pass are not really 111111111 and 222222222, the correct values do work with the website (using a browser...) Thanks in advance EDIT Another script I tried: from urllib import urlopen, urlencode import urllib2 import httplib url = 'https://itapp.wcu.edu/BanAuthRedirector/Default.aspx' myId = 'xxxxxxxx' myPin = 'xxxxxxxx' data = { 'id':myId, 'PIN':myPin, 'submit':'Request Access', 'wcuirs_uri':'https://cf.wcu.edu/busafrs/catcard/idsearch.cfm' } opener = urllib2.build_opener() opener.addheaders = [('User-agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11'), ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'), ('Accept-Encoding', 'gzip,deflate,sdch'), ('Accept-Language', 'en-US,en;q=0.8'), ('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.3')] request = urllib2.Request(url, urlencode(data)) open("mycatpage.html", 'w').write(opener.open(request)) This has the same behavior...

Read the article
Parsing with BeautifulSoup, error message TypeError: coercing to Unicode: need string or buffer, NoneType found

- by Samsun Knight

so I'm trying to scrape an Amazon page for data, and I'm getting an error when I try to parse for where the seller is located. Here's my code: #getting the html request = urllib2.Request('http://www.amazon.com/gp/offer-listing/0393934241/') opener = urllib2.build_opener() #hiding that I'm a webscraper request.add_header('User-Agent', 'Mozilla/5 (Solaris 10) Gecko') #opening it up, putting into soup form html = opener.open(request).read() soup = BeautifulSoup(html, "html5lib") #parsing for the seller info sellers = soup.findAll('div', {'class' : 'a-row a-spacing-medium olpOffer'}) for eachseller in sellers: #parsing for price price = eachseller.find('span', {'class' : 'a-size-large a-color-price olpOfferPrice a-text-bold'}) #parsing for shipping costs shippingprice = eachseller.find('span' , {'class' : 'olpShippingPrice'}) #parsing for condition condition = eachseller.find('span', {'class' : 'a-size-medium'}) #parsing for seller name sellername = eachseller.find('b') #parsing for seller location location = eachseller.find('div', {'class' : 'olpAvailability'}) #printing it all out print "price, " + price.string + ", shipping price, " + shippingprice.string + ", condition," + condition.string + ", seller name, " + sellername.string + ", location, " + location.string I get the error message, pertaining to the 'print' command at the end, "TypeError: coercing to Unicode: need string or buffer, NoneType found" I know that it's coming from this line - location = eachseller.find('div', {'class' : 'olpAvailability'}) - because the code works fine without that line, and I know that I'm getting NoneType because the line isn't finding anything. Here's the html from the section I'm looking to parse: <*div class="olpAvailability"> In Stock. Ships from WI, United States. <*br/><*a href="/gp/aag/details/ref=olp_merch_ship_9/175-0430757-3801038?ie=UTF8&asin=0393934241&seller=A1W2IX7T37FAMZ&sshmPath=shipping-rates#aag_shipping">Domestic shipping rates</a> and <*a href="/gp/aag/details/ref=olp_merch_return_9/175-0430757-3801038?ie=UTF8&asin=0393934241&seller=A1W2IX7T37FAMZ&sshmPath=returns#aag_returns">return policy</a>. <*/div> (but without the stars - just making sure the HTML doesn't compile out of code form) I don't see what's the problem with the 'location' line of code, or why it's not pulling the data I want. Help?

Read the article
Python web scraping involving HTML tags with attributes

- by rohanbk

I'm trying to make a web scraper that will parse a web-page of publications and extract the authors. The skeletal structure of the web-page is the following: <html> <body> <div id="container"> <div id="contents"> <table> <tbody> <tr> <td class="author">####I want whatever is located here ###</td> </tr> </tbody> </table> </div> </div> </body> </html> I've been trying to use BeautifulSoup and lxml thus far to accomplish this task, but I'm not sure how to handle the two div tags and td tag because they have attributes. In addition to this, I'm not sure whether I should rely more on BeautifulSoup or lxml or a combination of both. What should I do? At the moment, my code looks like what is below: import re import urllib2,sys import lxml from lxml import etree from lxml.html.soupparser import fromstring from lxml.etree import tostring from lxml.cssselect import CSSSelector from BeautifulSoup import BeautifulSoup, NavigableString address='http://www.example.com/' html = urllib2.urlopen(address).read() soup = BeautifulSoup(html) html=soup.prettify() html=html.replace('&nbsp', '&#160') html=html.replace('&iacute','&#237') root=fromstring(html) I realize that a lot of the import statements may be redundant, but I just copied whatever I currently had in more source file. EDIT: I suppose that I didn't make this quite clear, but I have multiple tags in page that I want to scrape.

Read the article
Download-from-PyPI-and-install script

- by zubin71

Hello, I have written a script which fetches a distribution, given the URL. After downloading the distribution, it compares the md5 hashes to verify that the file has been downloaded properly. This is how I do it. def download(package_name, url): import urllib2 downloader = urllib2.urlopen(url) package = downloader.read() package_file_path = os.path.join('/tmp', package_name) package_file = open(package_file_path, "w") package_file.write(package) package_file.close() I wonder if there is any better(more pythonic) way to do what I have done using the above code snippet. Also, once the package is downloaded this is what is done: def install_package(package_name): if package_name.endswith('.tar'): import tarfile tarfile.open('/tmp/' + package_name) tarfile.extract('/tmp') import shlex import subprocess installation_cmd = 'python %ssetup.py install' %('/tmp/'+package_name) subprocess.Popen(shlex.split(installation_cmd) As there are a number of imports for the install_package method, i wonder if there is a better way to do this. I`d love to have some constructive criticism and suggestions for improvement. Also, I have only implemented the install_package method for .tar files; would there be a better manner by which I could install .tar.gz and .zip files too without having to write seperate methods for each of these?

Read the article
Python: puzzling behaviour inside httplib

- by Anna

I have added one line ( import pdb; pdb.set_trace() ) to httplib's HTTPConnection.putheader, so I can see what's going on inside. httplib.py, line 489: def putheader(self, header, value): """Send a request header line to the server. For example: h.putheader('Accept', 'text/html') """ import pdb; pdb.set_trace() if self.__state != _CS_REQ_STARTED: raise CannotSendHeader() str = '%s: %s' % (header, value) self._output(str) then ran this from the interpreter import urllib2 urllib2.urlopen('http://www.ioerror.us/ip/headers') ... and as expected PDB kicks in: > c:\python26\lib\httplib.py(858)putheader() -> if self.__state != _CS_REQ_STARTED: (Pdb) in PDB I have the luxury of evaluating expressions on the fly, so I have tried to enter self.__state: (Pdb) self.__state *** AttributeError: HTTPConnection instance has no attribute '__state' Alas, there is no __state of this instance. However when I enter step, the debugger gets past the if self.__state != _CS_REQ_STARTED: line without a problem. Why is this happening? If the self.__state doesn't exist python would have to raise an exception as it did when I entered the expression. Python version: 2.6.4 on win32

Read the article
python thread prob after build

- by Apache

hi expert, i'm having task to scan wifi at specific interval and send it to the server, i've it in python and its works fine when i run manually, then build it to package and when run there is no progress at all, i already ask this question before at http://stackoverflow.com/questions/2735410/python-scritp-problem-once-build-and-package-it, then, i re-modify my code as below, then i found that thread is not functioning once i build, #!/usr/bin/env python import subprocess,threading,... configFile = open('/opt/Jemapoh_Wifi/config.txt', 'r') url = configFile.readline().strip() intervalTime = configFile.readline().strip() status = configFile.readline().strip() print "url "+url print "intervalTime "+intervalTime print "Status "+status.strip() def getMacAddress(): proc = subprocess.Popen('ifconfig -a wlan0 | grep HWaddr | sed \'/^.*HWaddr */!d; s///;q\'', shell=True, stdout=subprocess.PIPE, ) macAddress = proc.communicate()[0].strip() return macAddress def getTimestamp(): from time import strftime timeStamp = strftime("%Y-%m-%d %H:%M:%S") return timeStamp def scanWifi(): try: print "Scanning..." proc = subprocess.Popen('iwlist scan 2>/dev/null', shell=True, stdout=subprocess.PIPE, ) stdout_str = proc.communicate()[0] stdout_list=stdout_str.split('\n') essid=[] rssi=[] preQuality=[] for line in stdout_list: line=line.strip() match=re.search('ESSID:"(\S+)"',line) if match: essid.append(match.group(1)) match=re.search('Quality=(\S+)',line) if match: preQuality.append(match.group(1)) for qualityConversion in preQuality: qualityConversion = qualityConversion.split()[0].split('/') temp = str(int(round(float(qualityConversion[0]) / float(qualityConversion[1]) * 100))).rjust(2) rssi.append(temp) dataToPost = '{"userId":"' + getMacAddress() + '","timestamp":"' + getTimestamp() + '","wifi":[' for no in range(len(essid)): dataToPost += '{"ssid":"' + essid[no] + '","rssi":"' + rssi[no] + '"}' if no+1 == len(essid): pass else: dataToPost += ',' dataToPost += ']}' query_args = {"data":dataToPost} request = urllib2.Request(url) request.add_data(urllib.urlencode(query_args)) request.add_header('Content-Type', 'application/x-www-form-urlencoded') print "Waiting for server response..." print urllib2.urlopen(request).read() print "Data Sent @ " + getTimestamp() print "------------------------------------------------------" t = threading.Timer(int(intervalTime), scanWifi).start() except Exception, e: print e t = threading.Timer(int(intervalTime), scanWifi) t.start() once build, its not reaching the thread, do can anyone help, why the thread is not working after build thanks

Read the article
BeautifulSoup can't parse a webpage?

- by JLTChiu

I am using beautiful soup for parsing webpage now, I've heard it's very famous and good, but it doesn't seems works properly. Here's what I did import urllib2 from bs4 import BeautifulSoup page = urllib2.urlopen("http://www.cnn.com/2012/10/14/us/skydiver-record-attempt/index.html?hpt=hp_t1") soup = BeautifulSoup(page) print soup.prettify() I think this is kind of straightforward. I open the webpage and pass it to the beautifulsoup. But here's what I got: Warning (from warnings module): File "C:\Python27\lib\site-packages\bs4\builder\_htmlparser.py", line 149 "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.")) ... HTMLParseError: bad end tag: u'</"+"script>', at line 634, column 94 I thought CNN website should be well designed, so I am not very sure what's going on though. Does anyone has idea about this?

Read the article
Prettify working using python idle but not when using python script

- by Loclip

So when I use print soup.prettify() on idle it's prettifying my html but when I calling the script from server the prettify() not working #!/usr/bin/python import cgi, cgitb, urllib2, sys from bs4 import BeautifulSoup styles = [] i = 1 site = "www.example.com" page = urllib2.urlopen(site) soup = BeautifulSoup(page) table = soup.find('table', {'class' :'style5'}) alltd = table.findAll('td') allspans = table.findAll('span') while i<55: styles.append("style" + str(i)) i+=1 for td in alltd: if any(x in td["class"] for x in styles): td["class"] = "align" for span in allspans: span.replaceWithChildren() print "Content-type: text/html" print print "<!DOCTYPE html>" print "<html>" print "<head>" print '<meta http-equiv="content-type" content="text/html; charset=utf-8">' print '<link rel="stylesheet" href="lightbox/css/lightbox.css" type="text/css" media="screen">' print '<style type="text/css">' print "body {background-color:#b0c4de;}" print '.align {text-align: center;}' print "</style>" print "</head>" print "<body>" print table.pretiffy() print "</body>" print "</html>"

Read the article
BeautifulSoup HTMLParseError. What's wrong with this?

- by user1915496

This is my code: from bs4 import BeautifulSoup as BS import urllib2 url = "http://services.runescape.com/m=news/recruit-a-friend-for-free-membership-and-xp" res = urllib2.urlopen(url) soup = BS(res.read()) other_content = soup.find_all('div',{'class':'Content'})[0] print other_content Yet an error comes up: /Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py:149: RuntimeWarning: Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help. "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.")) Traceback (most recent call last): File "web.py", line 5, in <module> soup = BS(res.read()) File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in __init__ self._feed() File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in _feed self.builder.feed(self.markup) File "/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py", line 150, in feed raise e I've let two other people use this code, and it works for them perfectly fine. Why is it not working for me? I have bs4 installed...

Read the article
Python regex group clarification

- by nkr1pt

I have 0 experience with python, very little with regex and I'm trying to figure out what this small snippet of python regex would give back from a http response header Set-Cookie entry: REGEX_COOKIE = '([A-Z]+=[^;]+;)' resp = urllib2.urlopen(req) re.search(REGEX_COOKIE, resp.info()['Set-Cookie']).group(1) Can one give a simple example of a Set-Cookie value and explain what this would match on + return? Regards

Read the article
How to get the content of "Google Dictionary" by python script

- by user313439

Thank to the script, I've logged in google successfully. But I replaced the value of "gv_home_page_url" with http:// www.google.com.tw/dictionary/wordlist?hl=zh-TW, the error occured. The message is " urllib2.HTTPError: HTTP Error 500: Internal Server Error" Any idea will be appreciated, thanks.

Read the article
HTTP basic authentication using sockets in python

- by user352958

How to connect to a server using basic http auth thru sockets in python .I don't want to use urllib/urllib2 etc as my program does some low level socket I/O operations

Read the article
A Python Wrapper for Shutterfly. Uploading an Image

- by iJames

I'm working on a Django app in which I want to order prints through Shutterfly's Open API: http://www.shutterfly.com/documentation/start.sfly So far I've been able to build the appropriate POSTs and GETs using the modules and suggested code snippets including httplib, httplib2, urllib, urllib2, mimetype, etc. But I'm stuck on the image uploading when placing an order (the ordering process is not the same process as uploading images to albums which I haven't tried.) From what I can tell, I'm supposed to basically create the multipart form data by concatenating the HTTP request body together with the binary data of the image. I take the strings: --myuniqueboundary1273149960.175.1 Content-Disposition: form-data; name="AuthenticationID" auniqueauthenticationid --myuniqueboundary1273149960.175.1 Content-Disposition: file; name="Image.Data"; filename="1_41_orig.jpg" Content-Type: image/jpeg and I put this data into it and end with the final boundary: ...\xb5|\xf88\x1dj\t@\xd9\'\x1f\xc6j\x88{\x8a\xc0\x18\x8eGaJG\x03\xe9J-\xd8\x96[\x91T\xc3\x0eTu\xf4\xaa\xa5Ty\x80\x01\x8c\x9f\xe9Z\xad\x8cg\xba# g\x18\xe2\xaa:\x829\x02\xb4["\x17Q\xe7\x801\xea?\xad7j\xfd\xa2\xdf\x81\xd2\x84D\xb6)\xa8\xcb\xc8O\\\x9a\xaf(\x1cqM\x98\x8d*\xb8\'h\xc8+\x8e:u\xaa\xf3*\x9b\x95\x05F8\xedN%\xcb\xe1B2\xa9~Tw\xedF\xc4\xfe\xe8\xfc\xa9\x983\xff\xd9... That ends up making it look like this (when I use print to debug): ... --myuniqueboundary1273149960.175.1 Content-Disposition: file; name="Image.Data"; filename="1_41_orig.jpg" Content-Type: image/jpeg ????q?ExifMM* ? ??(1?2?<??i?b?NIKON CORPORATIONNIKON D40HHQuickTime 7.62009:02:17 13:05:25Mac OS X 10.5.6%??????"?'??0220?????? ???? ? ?|_???,b???50??5 ... --myuniqueboundary1273149960.175.1-- My code for grabbing the binary data is pretty much this: filedata = open('myjpegfile.jpeg','rb').read() Which I then add to the rest of the body. I've see something like this code everywhere. I'm then using this to post the full request (with the headers too): response = urllib2.urlopen(request).read() This seems to me to be the standard way that form POSTS with files happens. Am I missing something here? At some point I might be able to make this into a library worth posting up on github, but this problem has stopped me cold in my tracks. Thanks for any insight!

Read the article
python unittest howto

- by zubin71

I`d like to know how I could unit-test the following module. def download_distribution(url, tempdir): """ Method which downloads the distribution from PyPI """ print "Attempting to download from %s" % (url,) try: url_handler = urllib2.urlopen(url) distribution_contents = url_handler.read() url_handler.close() filename = get_file_name(url) file_handler = open(os.path.join(tempdir, filename), "w") file_handler.write(distribution_contents) file_handler.close() return True except ValueError, IOError: return False

Read the article
How to bind an ip address to telnetlib in Python

- by jack

The code below binds an ip address to urllib, urllib2, etc. import socket true_socket = socket.socket def bound_socket(*a, **k): sock = true_socket(*a, **k) sock.bind((sourceIP, 0)) return sock socket.socket = bound_socket Is it also able to bind an ip address to telnetlib?

Read the article

< Previous Page | 1 2 3 4 5 6 7 | Next Page >