I've got over 10,000 files that don't have extensions from older versions of the Mac OS. They're extremely nested, and they also have all sorts of strange formatting and characters. They don't have file types or creator codes attached to them any longer. A great deal of these files have text in the file that will let me determine extensions (for example Word.Document.8 is in every file created by that version of Word, and Excel.Sheet.8 in every file created with that version of Excel).
I found a script that looks like it would work for one of these file types at a time, but it erases parts of filenames after nefarious characters, which is not good.
find . -type f -not -name "." -print0 |\
xargs -0 file |\
grep 'Word.Document.8' |\
sed 's/:.*//' |\
xargs -I % echo mv % %.doc
So, two questions from that:
One is, should I clean the characters in the filenames first, or programmatically deal with those in the script in order to leave them the same? As long as I lose no information from the filenames, I don't see a problem cleaning out slashes and other problem characters. Also, if I clean the filenames, there are likely to be duplicates, so any cleaning script would have to add something like "-1" before the extension to make sure nothing gets lost.
2nd question is how do I change the script so that it will look for more than one file type at the same time and give each the proper extension?
I'm not tied to this script, but it is understandable, which is a pro. Mac OS X 10.6 is installed on this file server, but I've got access to any recent versions of OS X.
Thanks, Ian