NServiceBus pipeline with Distributors
- by David
I'm building a processing pipeline with NServiceBus but I'm having trouble with the configuration of the distributors in order to make each step in the process scalable. Here's some info:
The pipeline will have a master process that says "OK, time to start" for a WorkItem, which will then start a process like a flowchart.
Each step in the flowchart may be computationally expensive, so I want the ability to scale out each step. This tells me that each step needs a Distributor.
I want to be able to hook additional activities onto events later. This tells me I need to Publish() messages when it is done, not Send() them.
A process may need to branch based on a condition. This tells me that a process must be able to publish more than one type of message.
A process may need to join forks. I imagine I should use Sagas for this.
Hopefully these assumptions are good otherwise I'm in more trouble than I thought.
For the sake of simplicity, let's forget about forking or joining and consider a simple pipeline, with Step A followed by Step B, and ending with Step C. Each step gets its own distributor and can have many nodes processing messages.
NodeA workers contain a IHandleMessages processor, and publish EventA
NodeB workers contain a IHandleMessages processor, and publish Event B
NodeC workers contain a IHandleMessages processor, and then the pipeline is complete.
Here are the relevant parts of the config files, where # denotes the number of the worker, (i.e. there are input queues NodeA.1 and NodeA.2):
NodeA:
<MsmqTransportConfig InputQueue="NodeA.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" />
<UnicastBusConfig DistributorControlAddress="NodeA.Distrib.Control" DistributorDataAddress="NodeA.Distrib.Data" >
<MessageEndpointMappings>
</MessageEndpointMappings>
</UnicastBusConfig>
NodeB:
<MsmqTransportConfig InputQueue="NodeB.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" />
<UnicastBusConfig DistributorControlAddress="NodeB.Distrib.Control" DistributorDataAddress="NodeB.Distrib.Data" >
<MessageEndpointMappings>
<add Messages="Messages.EventA, Messages" Endpoint="NodeA.Distrib.Data" />
</MessageEndpointMappings>
</UnicastBusConfig>
NodeC:
<MsmqTransportConfig InputQueue="NodeC.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" />
<UnicastBusConfig DistributorControlAddress="NodeC.Distrib.Control" DistributorDataAddress="NodeC.Distrib.Data" >
<MessageEndpointMappings>
<add Messages="Messages.EventB, Messages" Endpoint="NodeB.Distrib.Data" />
</MessageEndpointMappings>
</UnicastBusConfig>
And here are the relevant parts of the distributor configs:
Distributor A:
<add key="DataInputQueue" value="NodeA.Distrib.Data"/>
<add key="ControlInputQueue" value="NodeA.Distrib.Control"/>
<add key="StorageQueue" value="NodeA.Distrib.Storage"/>
Distributor B:
<add key="DataInputQueue" value="NodeB.Distrib.Data"/>
<add key="ControlInputQueue" value="NodeB.Distrib.Control"/>
<add key="StorageQueue" value="NodeB.Distrib.Storage"/>
Distributor C:
<add key="DataInputQueue" value="NodeC.Distrib.Data"/>
<add key="ControlInputQueue" value="NodeC.Distrib.Control"/>
<add key="StorageQueue" value="NodeC.Distrib.Storage"/>
I'm testing using 2 instances of each node, and the problem seems to come up in the middle at Node B. There are basically 2 things that might happen:
Both instances of Node B report that it is subscribing to EventA, and also that NodeC.Distrib.Data@MYCOMPUTER is subscribing to the EventB that Node B publishes. In this case, everything works great.
Both instances of Node B report that it is subscribing to EventA, however, one worker says NodeC.Distrib.Data@MYCOMPUTER is subscribing TWICE, while the other worker does not mention it.
In the second case, which seem to be controlled only by the way the distributor routes the subscription messages, if the "overachiever" node processes an EventA, all is well. If the "underachiever" processes EventA, then the publish of EventB has no subscribers and the workflow dies.
So, my questions:
Is this kind of setup possible?
Is the configuration correct? It's hard to find any examples of configuration with distributors beyond a simple one-level publisher/2-worker setup.
Would it make more sense to have one central broker process that does all the non-computationally-intensive traffic cop operations, and only sends messages to processes behind distributors when the task is long-running and must be load balanced?
Then the load-balanced nodes could simply reply back to the central broker, which seems easier.
On the other hand, that seems at odds with the decentralization that is NServiceBus's strength.
And if this is the answer, and the long running process's done event is a reply, how do you keep the Publish that enables later extensibility on published events?