Set a script to automatically detect character encoding in a plain-text-file in Python?

Posted by Haidon on Stack Overflow See other posts from Stack Overflow or by Haidon
Published on 2010-06-14T02:27:15Z Indexed on 2010/06/14 2:32 UTC
Read the original article Hit count: 351

I've set up a script that basically does a large-scale find-and-replace on a plain text document.

At the moment it works fine with ASCII, UTF-8, and UTF-16 (and possibly others, but I've only tested these three) encoded documents so long as the encoding is specified inside the script (the example code below specifies UTF-16).

Is there a way to make the script automatically detect which of these character encodings is being used in the input file and automatically set the character encoding of the output file the same as the encoding used on the input file?

findreplace = [
('term1', 'term2'),
]    

inF = open(infile,'rb')
    s=unicode(inF.read(),'utf-16')
    inF.close()

    for couple in findreplace:
        outtext=s.replace(couple[0],couple[1])
        s=outtext

    outF = open(outFile,'wb')
    outF.write(outtext.encode('utf-16'))
    outF.close()

Thanks!

© Stack Overflow or respective owner

Related posts about python

Related posts about character-encoding