Parallel LINQ - PLINQ
- by nmarun
Turns out now with .net 4.0 we can run a query like a multi-threaded application. Say you want to query a collection of objects and return only those that meet certain conditions. Until now, we basically had one ‘control’ that iterated over all the objects in the collection, checked the condition on each object and returned if it passed. We obviously agree that if we can ‘break’ this task into smaller ones, assign each task to a different ‘control’ and ask all the controls to do their job - in-parallel, the time taken the finish the entire task will be much lower. Welcome to PLINQ. Let’s take some examples. I have the following method that uses our good ol’ LINQ. 1: private static void Linq(int lowerLimit, int upperLimit)
2: {
3: // populate an array with int values from lowerLimit to the upperLimit
4: var source = Enumerable.Range(lowerLimit, upperLimit);
5:
6: // Start a timer
7: Stopwatch stopwatch = new Stopwatch();
8: stopwatch.Start();
9:
10: // set the expectation => build the expression tree
11: var evenNumbers = from num in source
12: where IsDivisibleBy(num, 2)
13: select num;
14:
15: // iterate over and print the returned items
16: foreach (var number in evenNumbers)
17: {
18: Console.WriteLine(string.Format("** {0}", number));
19: }
20:
21: stopwatch.Stop();
22:
23: // check the metrics
24: Console.WriteLine(String.Format("Elapsed {0}ms", stopwatch.ElapsedMilliseconds));
25: }
I’ve added comments for the major steps, but the only thing I want to talk about here is the IsDivisibleBy() method. I know I could have just included the logic directly in the where clause. I called a method to add ‘delay’ to the execution of the query - to simulate a loooooooooong operation (will be easier to compare the results).
1: private static bool IsDivisibleBy(int number, int divisor)
2: {
3: // iterate over some database query
4: // to add time to the execution of this method;
5: // the TableB has around 10 records
6: for (int i = 0; i < 10; i++)
7: {
8: DataClasses1DataContext dataContext = new DataClasses1DataContext();
9: var query = from b in dataContext.TableBs select b;
10:
11: foreach (var row in query)
12: {
13: // Do NOTHING (wish my job was like this)
14: }
15: }
16:
17: return number % divisor == 0;
18: }
Now, let’s look at how to modify this to PLINQ.
1: private static void Plinq(int lowerLimit, int upperLimit)
2: {
3: // populate an array with int values from lowerLimit to the upperLimit
4: var source = Enumerable.Range(lowerLimit, upperLimit);
5:
6: // Start a timer
7: Stopwatch stopwatch = new Stopwatch();
8: stopwatch.Start();
9:
10: // set the expectation => build the expression tree
11: var evenNumbers = from num in source.AsParallel()
12: where IsDivisibleBy(num, 2)
13: select num;
14:
15: // iterate over and print the returned items
16: foreach (var number in evenNumbers)
17: {
18: Console.WriteLine(string.Format("** {0}", number));
19: }
20:
21: stopwatch.Stop();
22:
23: // check the metrics
24: Console.WriteLine(String.Format("Elapsed {0}ms", stopwatch.ElapsedMilliseconds));
25: }
That’s it, this is now in PLINQ format. Oh and if you haven’t found the difference, look line 11 a little more closely. You’ll see an extension method ‘AsParallel()’ added to the ‘source’ variable. Couldn’t be more simpler right? So this is going to improve the performance for us. Let’s test it.
So in my Main method of the Console application that I’m working on, I make a call to both.
1: static void Main(string[] args)
2: {
3: // set lower and upper limits
4: int lowerLimit = 1;
5: int upperLimit = 20;
6: // call the methods
7: Console.WriteLine("Calling Linq() method");
8: Linq(lowerLimit, upperLimit);
9:
10: Console.WriteLine();
11: Console.WriteLine("Calling Plinq() method");
12: Plinq(lowerLimit, upperLimit);
13:
14: Console.ReadLine(); // just so I get enough time to read the output
15: }
YMMV, but here are the results that I got:
It’s quite obvious from the above results that the Plinq() method is taking considerably less time than the Linq() version. I’m sure you’ve already noticed that the output of the Plinq() method is not in order. That’s because, each of the ‘control’s we sent to fetch the results, reported with values as and when they obtained them. This is something about parallel LINQ that one needs to remember – the collection cannot be guaranteed to be undisturbed. This could be counted as a negative about PLINQ (emphasize ‘could’).
Nevertheless, if we want the collection to be sorted, we can use a SortedSet (.net 4.0) or build our own custom ‘sorter’. Either way we go, there’s a good chance we’ll end up with a better performance using PLINQ.
And there’s another negative of PLINQ (depending on how you see it). This is regarding the CPU cycles. See the usage for Linq() method (used ResourceMonitor):
I have dual CPU’s and see the height of the peak in the bottom two blocks and now compare to what happens when I run the Plinq() method.
The difference is obvious. Higher usage, but for a shorter duration (width of the peak). Both these points make sense in both cases. Linq() runs for a longer time, but uses less resources whereas Plinq() runs for a shorter time and consumes more resources.
Even after knowing all these, I’m still inclined towards PLINQ.
PLINQ rocks! (no hard feelings LINQ)