S3 Backup Memory Usage in Python
- by danpalmer
I currently use WebFaction for my hosting with the basic package that gives us 80MB of RAM. This is more than adequate for our needs at the moment, apart from our backups. We do our own backups to S3 once a day.
The backup process is this: dump the database, tar.gz all the files into one backup named with the correct date of the backup, upload to S3 using the python library provided by Amazon.
Unfortunately, it appears (although I don't know this for certain) that either my code for reading the file or the S3 code is loading the entire file in to memory. As the file is approximately 320MB (for today's backup) it is using about 320MB just for the backup. This causes WebFaction to quit all our processes meaning the backup doesn't happen and our site goes down.
So this is the question: Is there any way to not load the whole file in to memory, or are there any other python S3 libraries that are much better with RAM usage. Ideally it needs to be about 60MB at the most! If this can't be done, how can I split the file and upload separate parts?
Thanks for your help.
This is the section of code (in my backup script) that caused the processes to be quit:
filedata = open(filename, 'rb').read()
content_type = mimetypes.guess_type(filename)[0]
if not content_type:
content_type = 'text/plain'
print 'Uploading to S3...'
response = connection.put(BUCKET_NAME, 'daily/%s' % filename, S3.S3Object(filedata), {'x-amz-acl': 'public-read', 'Content-Type': content_type})