What is the best way to remove accents in a python unicode string?

Posted by MiniQuark on Stack Overflow See other posts from Stack Overflow or by MiniQuark
Published on 2009-02-05T21:10:40Z Indexed on 2010/04/13 21:23 UTC
Read the original article Hit count: 675

Filed under:
|
|
|
|

I have a unicode string in python, and I would like to remove all the accents (diacritics).

I found on the Web an elegant way to do this in Java:

  1. convert the unicode string to its long normalized form (with a separate character for letters and diacritics)
  2. remove all the characters whose unicode type is "diacritic".

Do I need to install a library such as pyICU or is this possible with just the python standard library? And what about in python 3.0?

Important note: I would like to avoid code with an explicit mapping from accented characters to their non-accented counterpart.

Thanks for your help.

© Stack Overflow or respective owner

Related posts about python

Related posts about unicode