What's the fastest way to strip and replace a document of high unicode characters using Python?
Posted
by Rhubarb
on Stack Overflow
See other posts from Stack Overflow
or by Rhubarb
Published on 2010-05-18T02:29:38Z
Indexed on
2010/05/18
2:50 UTC
Read the original article
Hit count: 255
I am looking to replace from a large document all high unicode characters, such as accented Es, left and right quotes, etc., with "normal" counterparts in the low range, such as a regular 'E', and straight quotes. I need to perform this on a very large document rather often. I see an example of this in what I think might be perl here: http://www.designmeme.com/mtplugins/lowdown.txt
Is there a fast way of doing this in Python without using s.replace(...).replace(...).replace(...)...? I've tried this on just a few characters to replace and the document stripping became really slow.
© Stack Overflow or respective owner