What are the pitfalls of hardlinked files on my desktop PC?
- by MountainX
All the identical-content files on my PC are now hardlinked. (My data is completely de-duplicated. It is a consequence of the way I copied my data from my old computer.)
What pitfalls do I need to be aware of now that certain actions on one file could silently affect a number of other files?
I know that deleting the file I'm working on is not a problem (assuming I deleted it on purpose). It doesn't affect any of the other hardlinked files and I don't see that the delete action would lead to unexpected side effects.
Moving or renaming the file is not a problem. I don't see any unexpected consequences.
I don't think copying hardlinked files is a problem, but I'm not as confident about any unexpected consequences in this regard. What I have seen is that making a copy (to the same disk) of a hardlinked file with cp keeps the copy hardlinked (i.e., inode number doesn't change in the copy). Copying to another filesystem obviously breaks the hardlink. (I guess one pitfall is forgetting this fact, given that my PC has 3 hard disks.)
Changing permissions does affect all linked files. So far this has proven handy. (I made a large number of the hardlinked files read-only.)
None of the operations above seem to produce any major unexpected consequences.
However, as was pointed out to me by Daniel Beck in a comment, editing or modifying a file can sometimes be a problem. It depends on the tool and maybe the type of edit. (For example, editing small text files using sed seems to always break the link while using nano doesn't.) This introduces the chance that editing one file could affect all the hardlinked files (i.e., alter the original inode).
My proposed solution to this is to make all hardlinked files read-only (and that is already mostly the case). If I can't do that for some files, I will unlink those particular files. Is there any problem with this read-only approach?
I'm assuming that if I go to edit a file and find it to be read-only, I'll remember to unlink that filename while making it writable. So one pitfall might be forgetting this rule. In that case, I'll have to rely on my backups.
Am I correct in the above statements? And what else do I need to know?
BTW, I'm running Kubuntu 12.04. I'm also using btrfs. (I have 2 SSD's and 1 HDD in the PC. I will also be adding an external USB HDD. I'm also connected to a network and I mount some NFS shares. I don't assume any of these last bits are relevant to the question, but I'm adding them just in case.)
BTW, since I have more than one drive (with separate file systems), to unlink any file all I have to do is copy it to another drive, then move it back. However, using sed also works (in my testing). Here's my script:
sed -i 's/\(.\)/\1/' file1
Surprisingly, this even unlinks zero byte files. In my testing it also appears to work on non-text files without any special options. (But I understand that the --binary option might be needed on Windows, MS-DOS and Cygwin.) However, copying to another disk and moving back may be the best way to unlink. For my use-case, unlink command doesn't really "unlink", rather it "removes".