Deploying multiple identical copies of a virtual machine for compute tasks
- by Reid
I have a compute task which has a large number of library dependencies. I would like to deploy it on some of my company's large Linux clusters, where I do not have root. I could probably track down, compile, and install the right versions of all the libraries, but this looks to be quite tedious and would have to be repeated if I deployed it again somewhere else.
On the other hand, it's pretty easy to install on current Ubuntu. This led me to wonder about a virtual machine approach. Could I put together a virtual machine which booted up, ran the computation (with parameters from and results to the host), and then shut down? In other words, I'd like a command like this that I could run on the host:
$ ./run-vm --ram N --task /path/on/host/foo.sh --results /another/host/dir/
This would boot the VM, run foo.sh, and put the (relatively small) results of the computation in /another/host/dir/.
It's important to start up many instances of the VM simultaneously, both on a single node and multiple nodes of the cluster. So it would be nice if I didn't have to make many copies of the VM virtual disk and metadata.
As the task instances are completely independent, the VMs would not need any network support once deployed, or any outside communications beyond reading and writing the host filesystem.
Is this possible, and if so, how might I go about doing it? Are there assumptions I've made above which are bogus?