Many tasks can be decomposed using a Data Decomposition approach, but often, this is not appropriate. Frequently, decomposing the problem into distinctive tasks that must be performed is a more natural abstraction.
However, as I mentioned in Part 1, Task Decomposition tends to be a bit more difficult than data decomposition, and can require a bit more effort. Before we being parallelizing our algorithm based on the tasks being performed, we need to decompose our problem, and take special care of certain considerations such as ordering and grouping of tasks.
Up to this point in this series, I’ve focused on parallelization techniques which are most appropriate when a problem space can be decomposed by data. Using PLINQ and the Parallel class, I’ve shown how problem spaces where there is a collection of data, and each element needs to be processed, can potentially be parallelized.
However, there are many other routines where this is not appropriate. Often, instead of working on a collection of data, there is a single piece of data which must be processed using an algorithm or series of algorithms. Here, there is no collection of data, but there may still be opportunities for parallelism.
As I mentioned before, in cases like this, the approach is to look at your overall routine, and decompose your problem space based on tasks. The idea here is to look for discrete “tasks,” individual pieces of work which can be conceptually thought of as a single operation.
Let’s revisit the example I used in Part 1, an application startup path. Say we want our program, at startup, to do a bunch of individual actions, or “tasks”. The following is our list of duties we must perform right at startup:
Display a splash screen
Request a license from our license manager
Check for an update to the software from our web server
If an update is available, download it
Setup our menu structure based on our current license
Open and display our main, welcome Window
Hide the splash screen
The first step in Task Decomposition is breaking up the problem space into discrete tasks.
This, naturally, can be abstracted as seven discrete tasks. In the serial version of our program, if we were to diagram this, the general process would appear as:
These tasks, obviously, provide some opportunities for parallelism. Before we can parallelize this routine, we need to analyze these tasks, and find any dependencies between tasks. In this case, our dependencies include:
The splash screen must be displayed first, and as quickly as possible.
We can’t download an update before we see whether one exists.
Our menu structure depends on our license, so we must check for the license before setting up the menus.
Since our welcome screen will notify the user of an update, we can’t show it until we’ve downloaded the update.
Since our welcome screen includes menus that are customized based off the licensing, we can’t display it until we’ve received a license.
We can’t hide the splash until our welcome screen is displayed.
By listing our dependencies, we start to see the natural ordering that must occur for the tasks to be processed correctly.
The second step in Task Decomposition is determining the dependencies between tasks, and ordering tasks based on their dependencies.
Looking at these tasks, and looking at all the dependencies, we quickly see that even a simple decomposition such as this one can get quite complicated. In order to simplify the problem of defining the dependencies, it’s often a useful practice to group our tasks into larger, discrete tasks. The goal when grouping tasks is that you want to make each task “group” have as few dependencies as possible to other tasks or groups, and then work out the dependencies within that group. Typically, this works best when any external dependency is based on the “last” task within the group when it’s ordered, although that is not a firm requirement. This process is often called Grouping Tasks. In our case, we can easily group together tasks, effectively turning this into four discrete task groups:
1. Show our splash screen –
This needs to be left as its own task. First, multiple things depend on this task, mainly because we want this to start before any other action, and start as quickly as possible.
2. Check for Update and Download the Update if it Exists -
These two tasks logically group together. We know we only download an update if the update exists, so that naturally follows. This task has one dependency as an input, and other tasks only rely on the final task within this group.
3. Request a License, and then Setup the Menus –
Here, we can group these two tasks together. Although we mentioned that our welcome screen depends on the license returned, it also depends on setting up the menu, which is the final task here. Setting up our menus cannot happen until after our license is requested. By grouping these together, we further reduce our problem space.
4. Display welcome and hide splash -
Finally, we can display our welcome window and hide our splash screen. This task group depends on all three previous task groups – it cannot happen until all three of the previous groups have completed.
By grouping the tasks together, we reduce our problem space, and can naturally see a pattern for how this process can be parallelized. The diagram below shows one approach:
The orange boxes show each task group, with each task represented within. We can, now, effectively take these tasks, and run a large portion of this process in parallel, including the portions which may be the most time consuming. We’ve now created two parallel paths which our process execution can follow, hopefully speeding up the application startup time dramatically.
The main point to remember here is that, when decomposing your problem space by tasks, you need to:
Define each discrete action as an individual Task
Discover dependencies between your tasks
Group tasks based on their dependencies
Order the tasks and groups of tasks