Using Amazon S3 for multiple remote data site uploads, securely
- by Aitch
I've been playing about with Amazon S3 a little for the first time and like what I see for various reasons relating to my potential use case.
We have multiple (online) remote server boxes harvesting sensor data that is regularly uploaded every hour or so (rsync'ed) to a VPS server. The number of remote server boxes is growing regularly and forecast to keep growing (hundreds). The servers are geographically dispersed. The servers are also automatically built, therefore generic with standard tools and not bespoke per location. The data is many hundreds of files per day.
I want to avoid a situation where I need to provision more VPS storage, or additional servers every time we hit the VPS capacity limit, after every N server deployments, whatever N might be.
The remote servers can never be considered fully secure due to us not knowing what might happen to them when we are not looking. Our current solution is a bit naive and simply restricts inbound rsync only over ssh to known mac address directories and a known public key. There are plenty of holes to pick in this, I know.
Let's say I write or use a script like s3cmd/s3sync to potentially push up the files.
Would I need to manage hundreds of access keys and have each server customized to include this (do-able, but key management becomes nightmarish?)
Could I restrict inbound connections somehow (eg by mac address), or just allow write-only to any client that was running the script? ( i could deal with a flood of data if someone got into a system? )
having a bucket per remote machine does not seem feasible due to bucket limits?
I don't think I want to use a single common key as if one machine is breached then potentially, a malicious hack could get access to the filestore key and start deleting for ll clients, correct?
I hope my inexperience has not blinded me to some other solution that might be suggested!
I've read lots of examples of people using S3 for backup, but can't really find anything about this sort of data collection, unless my google terminology is wrong...
I've written more than I should here, perhaps it can be summarised thus: In a perfect world I just want to have one of our techs install a new remote server into a location and it automagically starts sending files home with little or no intervention, and minimises risk? Pipedream or feasible?
TIA, Aitch