synchronization of file locations between two machines
Posted
by intuited
on Server Fault
See other posts from Server Fault
or by intuited
Published on 2010-03-14T22:10:58Z
Indexed on
2010/03/14
22:15 UTC
Read the original article
Hit count: 381
Although similar threads have been asked on this site and its siblings before, I've not managed to glean the answer to this persistent question. Any help is much appreciated.
The situation: I've got two laptops; both contain a ton of music. Sometimes I move these music files to different locations, or change the metadata in them, or convert them to a different format. I might do any of these things on either machine. I rarely do all of them at once — ie it's unlikely that I'll convert a file's format and move it to a different location all in one go. I'd like to be able to synchronize these changes without having to sift through everything that was renamed or moved.
I'm familiar with rsync but I find it inadequate, because
- although it can compute checksums, it doesn't have any way to store them. So if a file differs, it can't figure out which side it changed on. This also means that it can't attempt to match a missing file to a new one with the same checksum (ie a move)
- if the filesize and date are the same, it , so it takes an epoch to do a sync on a large repository. I would like to only check the checksum if the files
- even if you turn on checksumming, it still doesn't use it intelligently: ie it checksums files even if the sizes differ. IIRC.
- it's not able to use file metadata as a means of file comparison. this is sort of a wishlist item but it seems doable.
I've also looked into rsnapshot, but its requirement to create a full backup is impractical in this situation. I don't need a backup, I just need a record of what file with each hash was where when. Unison seems like it might be able to do something vaguely along these lines, but I'm loathe to spend hours wading through its details only to discover that it's sadly lacking. Plus, it's fun asking questions on here.
What I'd like is a tool that does something along these lines:
- keeps track of file checksums or of actual renames, possibly using inotify to greatly reduce resource consumption/latency
- stores a database containing this info, along with other pertinencies like the file format and metadata, the actual inode, the filename history, etc.
- uses this info to provide more-intelligent synchronization with a counterpart on the other side.
So for example: if a file has been converted from flac to ogg, but kept the same base filename, or the same metadata, it should be able to send the new version over, and the other side should delete the original. Probably it should actually sequester it somewhere in case they or you screwed up, but that's a detail. And then when the transaction is done, the state is logged so that the next time the two interact they can work out their differences.
Maybe all this metadata stuff is a fancy pipe dream. I would actually be pretty happy if there was something out there that could just use checksums in an intelligent way. This would be sort of like having the intelligence of something like git, minus the need to duplicate data in an index/backup/etc (and branching, and checkouts, and all the other great stuff that RCSs do. basically just fast forward commit pushes are all I want, with maybe the option to roll back.)
So is there something out there that can do this? If not, can someone suggest a good way to start making it?
© Server Fault or respective owner