How can I create multiple identical AWS EC2 server instances with large amounts of persistent data?
- by mojones
I have a CPU-intensive data-processing application that I want to run across many (~100,000) input files. The application needs a large (~20GB) data file in order to run. What I would like to do is
create an EC2 machine image that has my application and associated data files installed
boot up a large number (e.g. 100) of instances of this image
split my input files up into 100 batches and send one batch to be processed on each instance
I am having trouble figuring out the best way to ensure that each instance has access to the large data file. The data file is too big to fit on the root filesystem of an AMI. I could use Block Storage, but a given Block Storage volume can only be attached to a single instance, so I would need 100 clones.
Is there some way to create a custom image that has more space on the root filsystem so that I can include my large data file? Or is there a better way to tackle this problem?