What is the best way to remove accents in a python unicode string?
Posted
by MiniQuark
on Stack Overflow
See other posts from Stack Overflow
or by MiniQuark
Published on 2009-02-05T21:10:40Z
Indexed on
2010/04/13
21:23 UTC
Read the original article
Hit count: 675
I have a unicode string in python, and I would like to remove all the accents (diacritics).
I found on the Web an elegant way to do this in Java:
- convert the unicode string to its long normalized form (with a separate character for letters and diacritics)
- remove all the characters whose unicode type is "diacritic".
Do I need to install a library such as pyICU or is this possible with just the python standard library? And what about in python 3.0?
Important note: I would like to avoid code with an explicit mapping from accented characters to their non-accented counterpart.
Thanks for your help.
© Stack Overflow or respective owner