How can I keep directories in sync
- by Guillaume Boudreau
I have a directory, dirA, that users can work in: they can create, modify, rename and delete files & sub-directores in dirA.
I want to keep another directory, dirB, in sync with dirA.
What I'd like, is a discussion on finding a working algorithm that would achieve the above, with the limitations listed below.
Requirements:
1. Something asynchronous - I don't want to stop file operations in dirA while I work in dirB.
2. I can't assume that I can just blindly rsync dirA to dirB on regular interval - dirA could contain millions of files & directories, and terrabytes of data. Completely walking the dirA tree could take hours.
Those two requirements makes this really difficult.
Having it asynchronous means that when I start working on a specific file from dirA, it might have moved a lot since it appeared.
And the second limitation means that I really need to watch dirA, and work on atomic file operations that I notice.
Current (broken) implementation:
1. Log all file & directory operations in dirA.
2. Using a separate process, read that log, and 'repeat' all the logged operations in dirB.
Why is it broken:
echo 1 > dirA/file1
# Allow the 'log reader' process to create dirB/file1:
log = "write dirA/file1"; action = cp dirA/file1 dirB/file1; result = OK
echo 1 > dirA/file2
mv dirA/file1 dirA/file3
mv dirA/file2 dirA/file1
rm dirA/file3
# End result: file1 contains '1'
# 'log reader' process starts working on the 4 above file operations:
log = "write file2"; action = cp dirA/file2 dirB/file2; result = failed: there is no dirA/file2
log = "rename file1 file3"; action = mv dirB/file1 dirB/file3; result = OK
log = "rename file2 file1"; action = mv dirB/file2 dirB/file1; result = failed: there is no dirB/file2
log = "delete file3"; action = rm dirB/file3; result = OK
# End result in dirB: no more files!
Another broken example:
echo 1 > dirA/dir1/file1
mv dirA/dir1 dirA/dir2
# 'log reader' process starts working on the 2 above file operations:
log = "write file1"; action = cp dirA/dir1/file1 dirB/dir1/file1; result = failed: there is no dirA/dir1/file1
log = "rename dir1 dir2"; action = mv dirB/dir1 dirB/dir2; result = failed: there is no dirA/dir1
# End result if dirB: nothing!