Set a script to automatically detect character encoding in a plain-text-file in Python?
Posted
by Haidon
on Stack Overflow
See other posts from Stack Overflow
or by Haidon
Published on 2010-06-14T02:27:15Z
Indexed on
2010/06/14
2:32 UTC
Read the original article
Hit count: 295
I've set up a script that basically does a large-scale find-and-replace on a plain text document.
At the moment it works fine with ASCII, UTF-8, and UTF-16 (and possibly others, but I've only tested these three) encoded documents so long as the encoding is specified inside the script (the example code below specifies UTF-16).
Is there a way to make the script automatically detect which of these character encodings is being used in the input file and automatically set the character encoding of the output file the same as the encoding used on the input file?
findreplace = [
('term1', 'term2'),
]
inF = open(infile,'rb')
s=unicode(inF.read(),'utf-16')
inF.close()
for couple in findreplace:
outtext=s.replace(couple[0],couple[1])
s=outtext
outF = open(outFile,'wb')
outF.write(outtext.encode('utf-16'))
outF.close()
Thanks!
© Stack Overflow or respective owner