What's the fastest way to strip and replace a document of high unicode characters using Python?

Posted by Rhubarb on Stack Overflow See other posts from Stack Overflow or by Rhubarb
Published on 2010-05-18T02:29:38Z Indexed on 2010/05/18 2:50 UTC
Read the original article Hit count: 266

Filed under:
|
|
|
|

I am looking to replace from a large document all high unicode characters, such as accented Es, left and right quotes, etc., with "normal" counterparts in the low range, such as a regular 'E', and straight quotes. I need to perform this on a very large document rather often. I see an example of this in what I think might be perl here: http://www.designmeme.com/mtplugins/lowdown.txt

Is there a fast way of doing this in Python without using s.replace(...).replace(...).replace(...)...? I've tried this on just a few characters to replace and the document stripping became really slow.

© Stack Overflow or respective owner

Related posts about python

Related posts about ascii