Set a script to automatically detect character encoding in a plain-text-file in Python?
- by Haidon
I've set up a script that basically does a large-scale find-and-replace on a plain text document.
At the moment it works fine with ASCII, UTF-8, and UTF-16 (and possibly others, but I've only tested these three) encoded documents so long as the encoding is specified inside the script (the example code below specifies UTF-16).
Is there a way to make the script automatically detect which of these character encodings is being used in the input file and automatically set the character encoding of the output file the same as the encoding used on the input file?
findreplace = [
('term1', 'term2'),
]
inF = open(infile,'rb')
s=unicode(inF.read(),'utf-16')
inF.close()
for couple in findreplace:
outtext=s.replace(couple[0],couple[1])
s=outtext
outF = open(outFile,'wb')
outF.write(outtext.encode('utf-16'))
outF.close()
Thanks!