Background
I am trying to work out the best structure for an Azure application. Each of my worker roles will spin up multiple long-running jobs. Over time I can transfer jobs from one instance to another by switching them to a readonly mode on the source instance, spinning them up on the target instance, and then spinning the original down on the source instance.
If I have too many jobs then I can tell Azure to spin up extra role instance, and use them for new jobs. Conversely if my load drops (e.g. during the night) then I can consolidate outstanding jobs to a few machines and tell Azure to give me fewer instances.
The trouble is that (as I understand it) Azure provides no mechanism to allow me to decide which instance to stop. Thus I cannot know which servers to consolidate onto, and some of my jobs will die when their instance stops, causing delays for users while I restart those jobs on surviving instances.
Idea 1: I decide which instance to stop, and return from its Run(). I then tell Azure to reduce my instance count by one, and hope it concludes that the broken instance is a good candidate. Has anyone tried anything like this?
Idea 2: I predefine a whole bunch of different worker roles, with identical contents. I can individually stop and start them by switching their instance count from zero to one, and back again. I think this idea would work, but I don't like it because it seems to go against the natural Azure way of doing things, and because it involves me in a lot of extra bookkeeping to manage the extra worker roles.
Idea 3: Live with it.
Any better ideas?