Tool or script to detect moved or renamed files on Linux prior to a backup
- by Pharaun
Basically I am searching to see if there exists a tool or script that can detect moved or renamed files so that I can get a list of renamed/moved files and apply the same operation on the other end of the network to conserve on bandwidth.
Basically disk storage is cheap but bandwidth isn't, and the problem is that the files often will be reorganized or moved around into a better directory structure thus when you use rsync to do the backup, rsync won't notice that its a renamed or moved file and re-transmission it over the network all over again despite having the same file on the other end.
So I am wondering if there exists a script or tool that can record where all the files are and their names, then just prior to a backup, it would rescan and detect moved or renamed files, then I can take that list and re-apply the move/rename operation on the other side.
Here's a list of the "general" features of the files:
Large unchanging files
They can be renamed or moved around
[Edit:] These all are good answers, and what I end up doing in the end was looking at all of the answers and will be writing some code to deal with this. Basically what I am thinking/working on now is:
Using something like AIDE for the "initial" scan and enable me to keep checksums on the files because they are supposed to never change, so it would aid on detecting corruption.
Creating an inotify daemon that would monitor these files/directory and recording any changes relating to renames & moving the files around to a log file.
There are some edge cases where inotify might fail to record that something happened to the file system, thus there is a final step of using find to search the file system for files that has a change time latter than the last backup.
This has several benefits:
Checksums/etc from AIDE to be able to check/make sure that some media did not get corrupt
Inotify keeps resource usage low and no need to re-scan the filesystem over and over
No need to patch rsync; If I have to patch things I can, but I would prefer to avoid patching things to keep the burden lower, (IE don't need to re-patch everytime there is an update).
I've used Unison before and its really nice, however I could've sworn that Unison does keep copies around on the filesystem and that its "archive" files can grow to be rather large?