Loading url with cyrillic symbols

Posted by Ockonal on Stack Overflow See other posts from Stack Overflow or by Ockonal
Published on 2010-05-14T15:19:09Z Indexed on 2010/05/14 15:44 UTC
Read the original article Hit count: 363

Filed under:

urllib

|

python

|

url

|

cyryllic

Hi guys, I have to load some url with cyrillic symbols. My script should work with this:

http://wincode.org/%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5/

If I'll use this in browser it would replaced into normal symbols, but urllib code fails with 404 error. How to decode correctly this url?

When I'm using that url directly in code, like address = 'that address', it works perfect. But I used parsing page for getting this url. I have a list of urls which contents cyrillic. Maybe they have uncorrect encoding? Here is more code:

requestData = urllib2.Request( %SOME_ADDRESS%, None,  {"User-Agent": user_agent})
requestHandler = pageHandler.open(requestData)

pageData = requestHandler.read().decode('utf-8')
soupHandler = BeautifulSoup(pageData)

topicLinks = []
for postBlock in soupHandler.findAll('a', href=re.compile('%SOME_REGEXP%')):
    topicLinks.append(postBlock['href'])

postAddress = choice(topicLinks)

postRequestData = urllib2.Request(postAddress, None,  {"User-Agent": user_agent})
postHandler = pageHandler.open(postRequestData)
postData = postHandler.read()

  File "/usr/lib/python2.6/urllib2.py", line 518, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

© Stack Overflow or respective owner

Related posts about urllib

Equivalent Javascript Functions for Python's urllib.quote() and urllib.unquote()

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, Are there any equivalent Javascript functions for Python's urllib.quote() and urllib.unquote()? The closest I've come across are escape(), encodeURI(), and encodeURIComponent() (and their corresponding un-encoding functions), but they don't encode/decode the same set of special characters… >>> More
Python: ImportError no module named urllib

as seen on Stack Overflow - Search for 'Stack Overflow'
I just rented a VPS from Linode, it has python2.5 and ubuntu 8.04 When I ho to python shell python import urllib I get ImportError: No module named urllib What can be the reason? How can I add this module to python? Isn't it prepackaged with the basic version? Can it be pythonpath problem… >>> More
Python urllib.urlopen IOError

as seen on Stack Overflow - Search for 'Stack Overflow'
So I have the following lines of code in a function sock = urllib.urlopen(url) html = sock.read() sock.close() and they work fine when I call the function by hand. However, when I call the function in a loop (using the same urls as earlier) I get the following error: > Traceback (most recent… >>> More
Python: fetching SVG file using urllib is returning binary when I need ASCII

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm using urllib (in Python) to fetch an SVG file: import urllib urllib.urlopen('http://alpha.vectors.cloudmade.com/BC9A493B41014CAABB98F0471D759707/-122.2487,37.87588,-122.265823,37.868054?styleid=1&viewport=400x231').read() which produces output of the sort: xb6\xf6\x00\xb3\xfb2\xff\xda\xc5\xf2\xc2\x14\xef\xcd\x82\x0b\xdbU\xb0\x81\xcaF\xd8\x1a\xf6\xdf[i)\xba\xcf\x80\xab\xd6\x8c\xe3l_\xe7\n\xed2… >>> More
making urllib request in Python from the client side

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi Guys, I've written a Python application that makes web requests using the urllib2 library after which it scrapes the data. I could deploy this as a web application which means all urllib2 requests go through my web-server. This leads to the danger of the server's IP being banned due to the high… >>> More

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More