We use rsync to update a mirror of our primary file server to an off-site colocated backup server. One of the issues we currently have is that our file server has 1TB of mostly smaller files (in the 10-100kb range), and when we're transferring this much data, we often end up with the connection being dropped several hours into the transfer. Rsync doesn't have a resume/retry feature that simply reconnects to the server to pickup where it left off -- you need to go through the file comparison process, which ends up being very length with the amount of files we have.
The solution that's recommended to get around is to split up your large rsync transfer into a series of smaller transfers. I've figured the best way to do this is by first letter of the top-level directory names, which doesn't give us a perfectly even distribution, but is good enough.
I'd like to confirm if my methodology for doing this is sane, or if there's a more simple way to accomplish the goal.
To do this, I iterate through A-Z, a-z, 0-9 to pick a one character $prefix. Initially I was thinking of just running
rsync -av --delete --delete-excluded --exclude "*.mp3" "src/$prefix*" dest/
(--exclude "*.mp3" is just an example, as we have a more lengthy exclude list for removing things like temporary files)
The problem with this is that any top-level directories in dest/ that are no longer present present on src will not get picked up by --delete. To get around this, I'm instead trying the following:
rsync \
--filter 'S /$prefix*' \
--filter 'R /$prefix*' \
--filter 'H /*' \
--filter 'P /*' \
-av --delete --delete-excluded --exclude "*.mp3" src/ dest/
I'm using the show and hide over include and exclude, because otherwise the --delete-excluded will delete anything that doesn't match $prefix.
Is this the most effective way of splitting the rsync into smaller chunks? Is there a more effective tool, or a flag that I've missed, that might make this more simple?