Database Compression in Python
- by user551832
I have hourly logs like
user1:joined
user2:log out
user1:added pic
user1:added comment
user3:joined
I want to compress all the flat files down to one file. There are around 30 million users in the logs and I just want the latest user log for all the logs.
My end result is I want to have a log look like
user1:added comment
user2:log out
user3:joined
Now my first attempt on a small scale was to just do a dict like
log['user1'] = "added comment"
Will doing a dict of 30 million key/val pairs have a giant memory footprint.. Or should I use something like sqllite to store them.. then just put the contents of the sqllite table back into a file?