Backing up data stored on Amazon S3
- by Fiver
I have an EC2 instance running a web server that stores users' uploaded files to S3. The files are written once and never change, but are retrieved occasionally by the users. We will likely accumulate somewhere around 200-500GB of data per year. We would like to ensure this data is safe, particularly from accidental deletions and would like to be able to restore files that were deleted regardless of the reason.
I have read about the versioning feature for S3 buckets, but I cannot seem to find if recovery is possible for files with no modification history. See the AWS docs here on versioning:
http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html
In those examples, they don't show the scenario where data is uploaded, but never modified, and then deleted. Are files deleted in this scenario recoverable?
Then, we thought we may just backup the S3 files to Glacier using object lifecycle management:
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
But, it seems this will not work for us, as the file object is not copied to Glacier but moved to Glacier (more accurately it seems it is an object attribute that is changed, but anyway...).
So it seems there is no direct way to backup S3 data, and transferring the data from S3 to local servers may be time-consuming and may incur significant transfer costs over time.
Finally, we thought we would create a new bucket every month to serve as a monthly full backup, and copy the original bucket's data to the new one on Day 1. Then using something like duplicity (http://duplicity.nongnu.org/) we would synchronize the backup bucket every night. At the end of the month we would put the backup bucket's contents in Glacier storage, and create a new backup bucket using a new, current copy of the original bucket...and repeat this process. This seems like it would work and minimize the storage / transfer costs, but I'm not sure if duplicity allows bucket-to-bucket transfers directly without bringing data down to the controlling client first.
So, I guess there are a couple questions here. First, does S3 versioning allow recovery of files that were never modified? Is there some way to "copy" files from S3 to Glacier that I have missed? Can duplicity or any other tool transfer files between S3 buckets directly to avoid transfer costs? Finally, am I way off the mark in my approach to backing up S3 data?
Thanks in advance for any insight you could provide!