Search Results

Search found 159 results on 7 pages for 'urllib2'.

Page 3/7 | < Previous Page | 1 2 3 4 5 6 7  | Next Page >

  • Python GUI Scraper hanging issues.

    - by bball
    I wrote a scraper using python a while back, and it worked fine in the command line. I have made a GUI for the application now, but I am having trouble with one issue. When I attempt to update text inside the gui (e.g. 'fetching URL 12/50'), I am unable seeing as the function within the scraper is grabbing 100+ links. Also when going from one scraping function, to a function that should update the gui, to another function, the gui update function seems to be skipped over while the next scrape function is run. An example would be: scrapeLinksA() #takes 20 seconds updateInfo("LinksA done") scrapeLinksB() #takes another 20 seconds in the above example, updateInfo is never executed, unless I end the program with a KeyboardInterrupt. I'm thinking my solution is threading, but I'm not sure. What can I do to fix this? I am using: PyQt4 urllib2 BeautifulSoup

    Read the article

  • Does Google appengine cache external requests?

    - by Andy Hume
    I have a very simple application running on appengine that requests a web page every five minutes and parses for a specific piece of data. Everything works fine except that the response I get back from the external request (using urllib2) doesn't reflect the latest changes to the page. Sometimes it takes a few minutes to get the latest, sometimes over an hour. Is there a transparent layer of caching that appengine puts in place? Or is there something else I am missing here? I've looked at the caching headers of the requested page and there is no Expires or LastModified's sent. Update: Sometimes, it will get the new version of the page for a number of requests and then randomly later get an old out of date version.

    Read the article

  • My python program always brings down my internet connection after several hours running, how do I debug and fix this problem?

    - by Shane
    I'm writing a python script checking/monitoring several server/websites status(response time and similar stuff), it's a GUI program and I use separate thread to check different server/website, and the basic structure of each thread is using an infinite while loop to request that site every random time period(15 to 30 seconds), once there's changes in website/server each thread will start a new thread to do a thorough check(requesting more pages and similar stuff). The problem is, my internet connection always got blocked/jammed/messed up after several hours running of this script, the situation is, from my script side I got urlopen error timed out each time it's requesting a page, and from my FireFox browser side I cannot open any site. But the weird thing is, the moment I close my script my Internet connection got back on immediately which means now I can surf any site through my browser, so it must be the script causing all the problem. I've checked the program carefully and even use del to delete any connection once it's used, still get the same problem. I only use urllib2, urllib, mechanize to do network requests. Anybody knows why such thing happens? How do I debug this problem? Is there a tool or something to check my network status once such situation occurs? It's really bugging me for a while... By the way I'm behind a VPN, does it have something to do with this problem? Although I don't think so because my network always get back on once the script closed, and the VPN connection never drops(as it appears) during the whole process.

    Read the article

  • Help with OpenSSL request using Python

    - by Ldn
    Hi i'm creating a program that has to make a request and then obtain some info. For doing that the website had done some API that i will use. There is an how-to about these API but every example is made using PHP. But my app is done using Python so i need to convert the code. here is the how-to: The request string is sealed with OpenSSL. The steps for sealing are as follows: • Random 128-bit key is created. • Random key is used to RSA-RC4 symettrically encrypt the request string. • Random key is encrypted with the public key using OpenSSL RSA asymmetrical encryption. • The encrypted request and encrypted key are each base64 encoded and placed in the appropriate fields. In PHP a full request to our API can be accomplished like so: <?php // initial request. $request = array('object' => 'Link', 'action' => 'get', 'args' => array( 'app_id' => 303612602 ) ); // encode the request in JSON $request = json_encode($request); // when you receive your profile, you will be given a public key to seal your request in. $key_pem = "-----BEGIN PUBLIC KEY----- MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALdu5C6d2sA1Lu71NNGBEbLD6DjwhFQO VLdFAJf2rOH63rG/L78lrQjwMLZOeHEHqjaiUwCr8NVTcVrebu6ylIECAwEAAQ== -----END PUBLIC KEY-----"; // load the public key $pkey = openssl_pkey_get_public($key_pem); // seal! $newrequest and $enc_keys are passed by reference. openssl_seal($request, $enc_request, $enc_keys, array($pkey)); // then wrap the request $wrapper = array( 'profile' => 'ProfileName', 'format' => 'RSA_RC4_Sealed', 'enc_key' => base64_encode($enc_keys[0]), 'request' => base64_encode($enc_request) ); // json encode the wrapper. urlencode it as well. $wrapper = urlencode(json_encode($wrapper)); // we can send the request wrapper via the cURL extension $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'http://api.site.com/'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, "request=$wrapper"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $data = curl_exec($ch); curl_close($ch); ?> Of all of that, i was able to convert "$request" and i'v also made the JSON encode. This is my code: import urllib import urllib2 import json url = 'http://api.site.com/' array = {'app_id' : "303612602"} values = { "object" : "Link", "action": "get", "args" : array } data = urllib.urlencode(values) json_data = json.dumps(data) What stop me is the sealing with OpenSSL and the publi key (that obviously i have) Using PHP OpenSSL it's so easy, but in Python i don't really know how to use it Please, help me!

    Read the article

  • How to solve Python memory leak when using urrlib2?

    - by b_m
    Hi, I'm trying to write a simple Python script for my mobile phone to periodically load a web page using urrlib2. In fact I don't really care about the server response, I'd only like to pass some values in the URL to the PHP. The problem is that Python for S60 uses the old 2.5.4 Python core, which seems to have a memory leak in the urrlib2 module. As I read there's seems to be such problems in every type of network communications as well. This bug have been reported here a couple of years ago, while some workarounds were posted as well. I've tried everything I could find on that page, and with the help of Google, but my phone still runs out of memory after ~70 page loads. Strangely the Garbege Collector does not seem to make any difference either, except making my script much slower. It is said that, that the newer (3.1) core solves this issue, but unfortunately I can't wait a year (or more) for the S60 port to come. here's how my script looks after adding every little trick I've found: import urrlib2, httplib, gc while(true): url = "http://something.com/foo.php?parameter=" + value f = urllib2.urlopen(url) f.read(1) f.fp._sock.recv=None # hacky avoidance f.close() del f gc.collect() Any suggestions, how to make it work forever without getting the "cannot allocate memory" error? Thanks for advance, cheers, b_m update: I've managed to connect 92 times before it ran out of memory, but It's still not good enough. update2: Tried the socket method as suggested earlier, this is the second best (wrong) solution so far: class UpdateSocketThread(threading.Thread): def run(self): global data while 1: url = "/foo.php?parameter=%d"%data s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('something.com', 80)) s.send('GET '+url+' HTTP/1.0\r\n\r\n') s.close() sleep(1) I tried the little tricks, from above too. The thread closes after ~50 uploads (the phone has 50MB of memory left, obviously the Python shell has not.) UPDATE: I think I'm getting closer to the solution! I tried sending multiple data without closing and reopening the socket. This may be the key since this method will only leave one open file descriptor. The problem is: import socket s=socket.socket(socket.AF_INET, socket.SOCK_STREAM) socket.connect(("something.com", 80)) socket.send("test") #returns 4 (sent bytes, which is cool) socket.send("test") #4 socket.send("test") #4 socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns the number of sent bytes, ok socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns 0 on the phone, error on Windows7* socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns 0 on the phone, error on Windows7* socket.send("test") #returns 0, strange... *: error message: 10053, software caused connection abort Why can't I send multiple messages??

    Read the article

  • Python: ImportError no module named urllib

    - by Yury Lifshits
    I just rented a VPS from Linode, it has python2.5 and ubuntu 8.04 When I ho to python shell python import urllib I get ImportError: No module named urllib What can be the reason? How can I add this module to python? Isn't it prepackaged with the basic version? Can it be pythonpath problem? How I can test pythonpath?

    Read the article

  • python urllib post question

    - by paul
    hello ALL im making some simple python post script but it not working well. there is 2 part to have to login. first login is using 'http://mybuddy.buddybuddy.co.kr/userinfo/UserInfo.asp' this one. and second login is using 'http://user.buddybuddy.co.kr/usercheck/UserCheckPWExec.asp' i can login first login page, but i couldn't login second page website. and return some error 'illegal access' such like . i heard this is related with some cooke but i don't know how to implement to resolve this problem. if anyone can help me much appreciated!! Thanks! import re,sys,os,mechanize,urllib,time import datetime,socket params = urllib.urlencode({'ID':'ph896011', 'PWD':'pk1089' }) rq = mechanize.Request("http://mybuddy.buddybuddy.co.kr/userinfo/UserInfo.asp", params) rs = mechanize.urlopen(rq) data = rs.read() logged_fail = r';history.back();</script>' in data if not logged_fail: print 'login success' try: params = urllib.urlencode({'PASSWORD':'pk1089'}) rq = mechanize.Request("http://user.buddybuddy.co.kr/usercheck/UserCheckPWExec.asp", params ) rs = mechanize.urlopen(rq) data = rs.read() print data except: print 'error'

    Read the article

  • Able to ping but cannot browse after several hours running of my python program

    - by Shane
    It's a GUI program I wrote in python checking website/server status running on my XP SP3, multi threads are used to check different site/server. After several hours running, the program starts to get urlopen error timed out all the time, and this always happens right after a POST request from a server(not a certain one, might be A or B or C), and it's also not the first POST request causing the problem, normally after several hours running and it happens to make a POST request at an unknown moment, all you get from then on is urlopen error timed out. I'm still able to ping but cannot browse any site, once the program closed everything's fine. It's definitely the program causing this problem, well I just don't know how to debug/check what the problem is, also don't know if it's from OS side or my program wasting too many resources/connections(are you still able to ping when too many connections used?), would anybody please help me out?

    Read the article

  • Python-Requests close http connection

    - by James Eggers
    I was wondering, how do you close a connection with Requests (python-requests.org)? With httplib it's HTTPConnection.close(), but how do I do the same with Requests? Code is below: r = requests.post("https://stream.twitter.com/1/statuses/filter.json", data={'track':toTrack}, auth=('username', 'passwd')) for line in r.iter_lines(): if line: self.mongo['db'].tweets.insert(json.loads(line)) Thanks in advance.

    Read the article

  • Troubleshooting Facebook's graph.put_object that returns error 400

    - by TimLeung
    I am using Facebook Python's SDK along with Google App Engine, and making a call to do a checkin: graph.put_object("me", "checkins", message="Hello, world", place="165039136840558", coordinates='{"latitude":"38.2454064", "longitude":"-122.0434404"}') However, this throws an error 400 Bad Request and I don't seem to be able to try catch it so I can have the important information. On a bad request, Facebook should return, an object like below which can help troubleshoot and address the issue, but I am not sure how I can retrieve this object: { "error": { "type" : "OAuthException", "message" : "An active access token must be used to query information about the current user." } }

    Read the article

  • Downloading a picture via urllib and python.

    - by Mike
    So I'm trying to make a Python script that downloads webcomics and puts them in a folder on my desktop. I've found a few similar programs on here that do something similar, but nothing quite like what I need. The one that I found most similar is right here (http://bytes.com/topic/python/answers/850927-problem-using-urllib-download-images). I tried using this code: >>> import urllib >>> image = urllib.URLopener() >>> image.retrieve("http://www.gunnerkrigg.com//comics/00000001.jpg","00000001.jpg") ('00000001.jpg', <httplib.HTTPMessage instance at 0x1457a80>) I then searched my computer for a file "00000001.jpg", but all I found was the cached picture of it. I'm not even sure it saved the file to my computer. Once I understand how to get the file downloaded, I think I know how to handle the rest. Essentially just use a for loop and split the string at the '00000000'.'jpg' and increment the '00000000' up to the largest number, which I would have to somehow determine. Any reccomendations on the best way to do this or how to download the file correctly? Thanks!

    Read the article

  • Use Twisted's getPage as urlopen?

    - by RadiantHex
    Hi folks, I would like to use Twisted non-blocking getPage method within a webapp, but it feels quite complicated to use such function compared to urlopen. This is an example of what I'm trying to achive: def web_request(request): response = urllib.urlopen('http://www.example.org') return HttpResponse(len(response.read())) Is it so hard to have something similar with getPage?

    Read the article

  • escaping query string with special characters with python

    - by that_guy
    I got some pretty messy urls that i got via scraping here, problem is that they contain spaces or other special characters in the path and query string, here is some example http://www.example.com/some path/to the/file.html http://www.example.com/some path/?file=path to/file name.png&name=name.me so, is there an easy and robust way to escape the urls so that i can pass them to urlopen? i tried urlib.quote, but it seems to escape the '?', '&', and '=' in the query string as well, and it seems to escape the protocol as well, currently, what i am trying to do is use regex to separate the protocol, path name, and query string and escape them separately, but there are cases where they arent separated properly any advice is appreciated

    Read the article

  • How to call Twiter's Streaming/Filter Feed with urllib2/httplib?

    - by Simon
    Update: I switched this back from answered as I tried the solution posed in cogent Nick's answer and switched to Google's urlfetch: logging.debug("starting urlfetch for http://%s%s" % (self.host, self.url)) result = urlfetch.fetch("http://%s%s" % (self.host, self.url), payload=self.body, method="POST", headers=self.headers, allow_truncated=True, deadline=5) logging.debug("finished urlfetch") but unfortunately finished urlfetch is never printed - I see the timeout happen in the logs (it returns 200 after 5 seconds), but execution doesn't seem tor return. Hi All- I'm attempting to play around with Twitter's Streaming (aka firehose) API with Google App Engine (I'm aware this probably isn't a great long term play as you can't keep the connection perpetually open with GAE), but so far I haven't had any luck getting my program to actually parse the results returned by Twitter. Some code: logging.debug("firing up urllib2") req = urllib2.Request(url="http://%s%s" % (self.host, self.url), data=self.body, headers=self.headers) logging.debug("called urlopen for %s %s, about to call urlopen" % (self.host, self.url)) fobj = urllib2.urlopen(req) logging.debug("called urlopen") When this executes, unfortunately, my debug output never shows the called urlopen line printed. I suspect what's happening is that Twitter keeps the connection open and urllib2 doesn't return because the server doesn't terminate the connection. Wireshark shows the request being sent properly and a response returned with results. I tried adding Connection: close to my request header, but that didn't yield a successful result. Any ideas on how to get this to work? thanks -Simon

    Read the article

  • Check if the internet cannot be accessed in Python

    - by Sridhar Ratnakumar
    I have an app that makes a HTTP GET request to a particular URL on the internet. But when the network is down (say, no public wifi - or my ISP is down, or some such thing), I get the following traceback at urllib.urlopen: 70, in get u = urllib2.urlopen(req) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 409, in _open '_open', req) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 1161, in http_open return self.do_open(httplib.HTTPConnection, req) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 1136, in do_open raise URLError(err) URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known> I want to print a friendly error to the user telling him that his network maybe down instead of this unfriendly "nodename nor servname provided" error message. Sure I can catch URLError, but that would catch every url error, not just the one related to network downtime. I am not a purist, so even an error message like "The server example.com cannot be reached; either the server is indeed having problems or your network connection is down" would be nice. How do I go about selectively catching such errors? (For a start, if DNS resolution fails at urllib.urlopen, that can be reasonably assumed as network inaccessibility? If so, how do I "catch" it in the except block?)

    Read the article

  • python socket problem

    - by uytg
    I write this python code: import socks import socket socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "64.83.219.7", 58279) socket.socket = socks.socksocket socket.setdefaulttimeout(19) import urllib2 print urllib2.urlopen('http://www.google.com').read() but when I execute it, I get this error: urllib2.URLError: <urlopen error timed out> What am I doing wrong?

    Read the article

  • can load data(google app enngine) from http://localhost:8100/remote_api ..

    - by zjm1126
    i can download data from gae (http://zjm1126.appspot.com/remote_api), this is code: appcfg.py download_data --application=zjm1126 --url=http://zjm1126.appspot.com/remote_api --filename=a.csv and it successful : D:\zjm_demo\app>appcfg.py download_data --application=zjm1126 --url=http://zjm1 126.appspot.com/remote_api --filename=a.csv Downloading data records. [INFO ] Logging to bulkloader-log-20100618.162421 [INFO ] Throttling transfers: [INFO ] Bandwidth: 250000 bytes/second [INFO ] HTTP connections: 8/second [INFO ] Entities inserted/fetched/modified: 20/second [INFO ] Batch Size: 10 [INFO ] Opening database: bulkloader-progress-20100618.162421.sql3 [INFO ] Opening database: bulkloader-results-20100618.162421.sql3 [INFO ] Connecting to zjm1126.appspot.com/remote_api Please enter login credentials for zjm1126.appspot.com Email: [email protected] Password for [email protected]: [INFO ] Downloading kinds: [u'LogText', u'Greeting', u'Forum', u'Thread'] .... [INFO ] Have 0 entities, 0 previously transferred [INFO ] 0 entities (8804 bytes) transferred in 11.3 seconds so i want to know can load data from 127.0.0.1 , this is my code : appcfg.py download_data --application=zjm1126 --url=http://localhost:8100/remote_api --filename=a.csv and the error is : D:\zjm_demo\app>appcfg.py download_data --application=zjm1126 --url=http://loca lhost:8100/remote_api --filename=a.csv Downloading data records. [INFO ] Logging to bulkloader-log-20100618.162325 [INFO ] Throttling transfers: [INFO ] Bandwidth: 250000 bytes/second [INFO ] HTTP connections: 8/second [INFO ] Entities inserted/fetched/modified: 20/second [INFO ] Batch Size: 10 [INFO ] Opening database: bulkloader-progress-20100618.162325.sql3 [INFO ] Opening database: bulkloader-results-20100618.162325.sql3 Please enter login credentials for localhost Email: [email protected] Password for [email protected]: [INFO ] Connecting to localhost:8100/remote_api [ERROR ] Exception during authentication Traceback (most recent call last): File "d:\Program Files\Google\google_appengine\google\appengine\tools\bulkload er.py", line 3169, in Run self.request_manager.Authenticate() File "d:\Program Files\Google\google_appengine\google\appengine\tools\bulkload er.py", line 1178, in Authenticate remote_api_stub.MaybeInvokeAuthentication() File "d:\Program Files\Google\google_appengine\google\appengine\ext\remote_api \remote_api_stub.py", line 542, in MaybeInvokeAuthentication datastore_stub._server.Send(datastore_stub._path, payload=None) File "d:\Program Files\Google\google_appengine\google\appengine\tools\appengin e_rpc.py", line 346, in Send f = self.opener.open(req) File "D:\Python25\lib\urllib2.py", line 387, in open response = meth(req, response) File "D:\Python25\lib\urllib2.py", line 498, in http_response 'http', request, response, code, msg, hdrs) File "D:\Python25\lib\urllib2.py", line 425, in error return self._call_chain(*args) File "D:\Python25\lib\urllib2.py", line 360, in _call_chain result = func(*args) File "D:\Python25\lib\urllib2.py", line 506, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 404: Not Found [INFO ] Authentication Failed so what should i do , thanks

    Read the article

  • Loading url with cyrillic symbols

    - by Ockonal
    Hi guys, I have to load some url with cyrillic symbols. My script should work with this: http://wincode.org/%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5/ If I'll use this in browser it would replaced into normal symbols, but urllib code fails with 404 error. How to decode correctly this url? When I'm using that url directly in code, like address = 'that address', it works perfect. But I used parsing page for getting this url. I have a list of urls which contents cyrillic. Maybe they have uncorrect encoding? Here is more code: requestData = urllib2.Request( %SOME_ADDRESS%, None, {"User-Agent": user_agent}) requestHandler = pageHandler.open(requestData) pageData = requestHandler.read().decode('utf-8') soupHandler = BeautifulSoup(pageData) topicLinks = [] for postBlock in soupHandler.findAll('a', href=re.compile('%SOME_REGEXP%')): topicLinks.append(postBlock['href']) postAddress = choice(topicLinks) postRequestData = urllib2.Request(postAddress, None, {"User-Agent": user_agent}) postHandler = pageHandler.open(postRequestData) postData = postHandler.read() File "/usr/lib/python2.6/urllib2.py", line 518, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found

    Read the article

  • Python urllib3 and how to handle cookie support?

    - by bigredbob
    So I'm looking into urllib3 because it has connection pooling and is thread safe (so performance is better, especially for crawling), but the documentation is... minimal to say the least. urllib2 has build_opener so something like: #!/usr/bin/python import cookielib, urllib2 cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) r = opener.open("http://example.com/") But urllib3 has no build_opener method, so the only way I have figured out so far is to manually put it in the header: #!/usr/bin/python import urllib3 http_pool = urllib3.connection_from_url("http://example.com") myheaders = {'Cookie':'some cookie data'} r = http_pool.get_url("http://example.org/", headers=myheaders) But I am hoping there is a better way and that one of you can tell me what it is. Also can someone tag this with "urllib3" please.

    Read the article

  • Unable to retrieve search results from server side : Facebook Graph API usig Python

    - by DjangoRocks
    Hi all, I'm doing some simple Python + FB Graph training on my own, and I faced a weird problem: import time import sys import urllib2 import urllib from json import loads base_url = "https://graph.facebook.com/search?q=" post_id = None post_type = None user_id = None message = None created_time = None def doit(hour): page = 1 search_term = "\"Plastic Planet\"" encoded_search_term = urllib.quote(search_term) print encoded_search_term type="&type=post" url = "%s%s%s" % (base_url,encoded_search_term,type) print url while(1): try: response = urllib2.urlopen(url) except urllib2.HTTPError, e: print e finally: pass content = response.read() content = loads(content) print "==================================" for c in content["data"]: print c print "****************************************" try: content["paging"] print "current URL" print url print "next page!------------" url = content["paging"]["next"] print url except: pass finally: pass """ print "new URL is =======================" print url print "==================================" """ print url What I'm trying to do here is to automatically page through the search results, but trying for content["paging"]["next"] But the weird thing is that no data is returned; i received the following: {"data":[]} Even in the very first loop. But when i copied the URL into a browser, a lot of results were returned. I've also tried a version with my access token and th same thing happens. Can anyone enlighten me? +++++++++++++++++++EDITED and SIMPLIFIED++++++++++++++++++ ok thanks to TryPyPy, here's the simplified and edited version of my previous question: Why is that: import urllib2 url = "https://graph.facebook.com/searchq=%22Plastic+Planet%22&type=post&limit=25&until=2010-12-29T19%3A54%3A56%2B0000" response = urllib2.urlopen(url) print response.read() result in {"data":[]} ? But the same url produces a lot of data in a browser? Anyone? Best Regards.

    Read the article

  • Problem with re.findall (duplicates)

    - by user559385
    Hello, I tried to fetch source of 4chan site, and get links to threads. I have problem with regexp (isn't working). Source: import urllib2, re req = urllib2.Request('http://boards.4chan.org/wg/') resp = urllib2.urlopen(req) html = resp.read() print re.findall("res/[0-9]+", html) #print re.findall("^res/[0-9]+$", html) The problem is that: print re.findall("res/[0-9]+", html) is giving duplicates. I can't use: print re.findall("^res/[0-9]+$", html) I have read python docs but they didn't help.

    Read the article

< Previous Page | 1 2 3 4 5 6 7  | Next Page >