How to improve this bash shell script for turning hardlinks into symlinks?
- by MountainX
This shell script is mostly the work of other people. It has gone through several iterations, and I have tweaked it slightly while also trying to fully understand how it works. I think I understand it now, but I don't have confidence to significantly alter it on my own and risk losing data when I run the altered version. So I would appreciate some expert guidance on how to improve this script.
The changes I am seeking are:
make it even more robust to any strange file names, if possible. It currently handles spaces in file names, but not newlines. I can live with that (because I try to find any file names with newlines and get rid of them).
make it more intelligent about which file gets retained as the actual inode content and which file(s) become sym links. I would like to be able to choose to retain the file that is either a) the shortest path, b) the longest path or c) has the filename with the most alpha characters (which will probably be the most descriptive name).
allow it to read the directories to process either from parameters passed in or from a file.
optionally, write a long of all changes and/or all files not processed.
Of all of these, #2 is the most important for me right now. I need to process some files with it and I need to improve the way it chooses which files to turn into symlinks. (I tried using things like the find option -depth without success.)
Here's the current script:
#!/bin/bash
# clean up known problematic files first.
## find /home -type f -wholename '*Icon*
## *' -exec rm '{}' \;
# Configure script environment
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
set -o nounset
dir='/SOME/PATH/HERE/'
# For each path which has multiple links
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# (except ones containing newline)
last_inode=
while IFS= read -r path_info
do
#echo "DEBUG: path_info: '$path_info'"
inode=${path_info%%:*}
path=${path_info#*:}
if [[ $last_inode != $inode ]]; then
last_inode=$inode
path_to_keep=$path
else
printf "ln -s\t'$path_to_keep'\t'$path'\n"
rm "$path"
ln -s "$path_to_keep" "$path"
fi
done < <( find "$dir" -type f -links +1 ! -wholename '*
*' -printf '%i:%p\n' | sort --field-separator=: )
# Warn about any excluded files
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
buf=$( find "$dir" -type f -links +1 -path '*
*' )
if [[ $buf != '' ]]; then
echo 'Some files not processed because their paths contained newline(s):'$'\n'"$buf"
fi
exit 0