Parallelism in .NET – Part 6, Declarative Data Parallelism

Posted by Reed on Reed Copsey See other posts from Reed Copsey or by Reed
Published on Wed, 27 Jan 2010 01:26:01 +0000 Indexed on 2010/12/06 17:00 UTC
Read the original article Hit count: 1051

Filed under:
|
|
|
|
|
|

When working with a problem that can be decomposed by data, we have a collection, and some operation being performed upon the collection.  I’ve demonstrated how this can be parallelized using the Task Parallel Library and imperative programming using imperative data parallelism via the Parallel class.  While this provides a huge step forward in terms of power and capabilities, in many cases, special care must still be given for relative common scenarios.

C# 3.0 and Visual Basic 9.0 introduced a new, declarative programming model to .NET via the LINQ Project.  When working with collections, we can now write software that describes what we want to occur without having to explicitly state how the program should accomplish the task.  By taking advantage of LINQ, many operations become much shorter, more elegant, and easier to understand and maintain.  Version 4.0 of the .NET framework extends this concept into the parallel computation space by introducing Parallel LINQ.

Before we delve into PLINQ, let’s begin with a short discussion of LINQ.  LINQ, the extensions to the .NET Framework which implement language integrated query, set, and transform operations, is implemented in many flavors.  For our purposes, we are interested in LINQ to Objects.  When dealing with parallelizing a routine, we typically are dealing with in-memory data storage.  More data-access oriented LINQ variants, such as LINQ to SQL and LINQ to Entities in the Entity Framework fall outside of our concern, since the parallelism there is the concern of the data base engine processing the query itself.

LINQ (LINQ to Objects in particular) works by implementing a series of extension methods, most of which work on IEnumerable<T>.  The language enhancements use these extension methods to create a very concise, readable alternative to using traditional foreach statement.  For example, let’s revisit our minimum aggregation routine we wrote in Part 4:

double min = double.MaxValue;
foreach(var item in collection)
{
    double value = item.PerformComputation();
    min = System.Math.Min(min, value);
}

Here, we’re doing a very simple computation, but writing this in an imperative style.  This can be loosely translated to English as:

Create a very large number, and save it in min
Loop through each item in the collection.
For every item:
    Perform some computation, and save the result
    If the computation is less than min, set min to the computation 

Although this is fairly easy to follow, it’s quite a few lines of code, and it requires us to read through the code, step by step, line by line, in order to understand the intention of the developer.

We can rework this same statement, using LINQ:

double min = collection.Min(item => item.PerformComputation());

Here, we’re after the same information.  However, this is written using a declarative programming style.  When we see this code, we’d naturally translate this to English as:

Save the Min value of collection, determined via calling item.PerformComputation() 

That’s it – instead of multiple logical steps, we have one single, declarative request.  This makes the developer’s intentions very clear, and very easy to follow.  The system is free to implement this using whatever method required.

Parallel LINQ (PLINQ) extends LINQ to Objects to support parallel operations.  This is a perfect fit in many cases when you have a problem that can be decomposed by data.  To show this, let’s again refer to our minimum aggregation routine from Part 4, but this time, let’s review our final, parallelized version:

// Safe, and fast!
double min = double.MaxValue;
// Make a "lock" object
object syncObject = new object();
Parallel.ForEach(
    collection,
    // First, we provide a local state initialization delegate.
    () => double.MaxValue,
    // Next, we supply the body, which takes the original item, loop state,
    // and local state, and returns a new local state
    (item, loopState, localState) =>
    {
        double value = item.PerformComputation();
        return System.Math.Min(localState, value);
    },
    // Finally, we provide an Action<TLocal>, to "merge" results together
    localState =>
    {
        // This requires locking, but it's only once per used thread
        lock(syncObj)
            min = System.Math.Min(min, localState);
    }
);

Here, we’re doing the same computation as above, but fully parallelized.  Describing this in English becomes quite a feat:

Create a very large number, and save it in min
Create a temporary object we can use for locking
Call Parallel.ForEach, specifying three delegates
    For the first delegate:
        Initialize a local variable to hold the local state to a very large number
    For the second delegate:
        For each item in the collection, perform some computation, save the result
        If the result is less than our local state, save the result in local state
    For the final delegate:
        Take a lock on our temporary object to protect our min variable
        Save the min of our min and local state variables

Although this solves our problem, and does it in a very efficient way, we’ve created a set of code that is quite a bit more difficult to understand and maintain.

PLINQ provides us with a very nice alternative.  In order to use PLINQ, we need to learn one new extension method that works on IEnumerable<T>ParallelEnumerable.AsParallel().

That’s all we need to learn in order to use PLINQ: one single method.  We can write our minimum aggregation in PLINQ very simply:

double min = collection.AsParallel().Min(item => item.PerformComputation());

By simply adding “.AsParallel()” to our LINQ to Objects query, we converted this to using PLINQ and running this computation in parallel! 

This can be loosely translated into English easily, as well:

Process the collection in parallel
Get the Minimum value, determined by calling PerformComputation on each item 

Here, our intention is very clear and easy to understand.  We just want to perform the same operation we did in serial, but run it “as parallel”.  PLINQ completely extends LINQ to Objects: the entire functionality of LINQ to Objects is available.  By simply adding a call to AsParallel(), we can specify that a collection should be processed in parallel.  This is simple, safe, and incredibly useful.

© Reed Copsey or respective owner

Related posts about .NET

Related posts about algorithms