C#/.NET Little Wonders: Interlocked CompareExchange()
- by James Michael Hare
Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Two posts ago, I discussed the Interlocked Add(), Increment(), and Decrement() methods (here) for adding and subtracting values in a thread-safe, lightweight manner. Then, last post I talked about the Interlocked Read() and Exchange() methods (here) for safely and efficiently reading and setting 32 or 64 bit values (or references). This week, we’ll round out the discussion by talking about the Interlocked CompareExchange() method and how it can be put to use to exchange a value if the current value is what you expected it to be. Dirty reads can lead to bad results Many of the uses of Interlocked that we’ve explored so far have centered around either reading, setting, or adding values. But what happens if you want to do something more complex such as setting a value based on the previous value in some manner? Perhaps you were creating an application that reads a current balance, applies a deposit, and then saves the new modified balance, where of course you’d want that to happen atomically. If you read the balance, then go to save the new balance and between that time the previous balance has already changed, you’ll have an issue! Think about it, if we read the current balance as $400, and we are applying a new deposit of $50.75, but meanwhile someone else deposits $200 and sets the total to $600, but then we write a total of $450.75 we’ve lost $200! Now, certainly for int and long values we can use Interlocked.Add() to handles these cases, and it works well for that. But what if we want to work with doubles, for example? Let’s say we wanted to add the numbers from 0 to 99,999 in parallel. We could do this by spawning several parallel tasks to continuously add to a total: 1: double total = 0;
2:
3: Parallel.For(0, 10000, next =>
4: {
5: total += next;
6: });
Were this run on one thread using a standard for loop, we’d expect an answer of 4,999,950,000 (the sum of all numbers from 0 to 99,999).
But when we run this in parallel as written above, we’ll likely get something far off. The result of one of my runs, for example, was 1,281,880,740. That is way off! If this were banking software we’d be in big trouble with our clients.
So what happened? The += operator is not atomic, it will read in the current value, add the result, then store it back into the total. At any point in all of this another thread could read a “dirty” current total and accidentally “skip” our add.
So, to clean this up, we could use a lock to guarantee concurrency:
1: double total = 0.0;
2: object locker = new object();
3:
4: Parallel.For(0, count, next =>
5: {
6: lock (locker)
7: {
8: total += next;
9: }
10: });
Which will give us the correct result of 4,999,950,000. One thing to note is that locking can be heavy, especially if the operation being locked over is trivial, or the life of the lock is a high percentage of the work being performed concurrently. In the case above, the lock consumes pretty much all of the time of each parallel task – and the task being locked on is relatively trivial.
Now, let me put in a disclaimer here before we go further:
For most uses, lock is more than sufficient for your needs, and is often the simplest solution!
So, if lock is sufficient for most needs, why would we ever consider another solution? The problem with locking is that it can suspend execution of your thread while it waits for the signal that the lock is free. Moreover, if the operation being locked over is trivial, the lock can add a very high level of overhead. This is why things like Interlocked.Increment() perform so well, instead of locking just to perform an increment, we perform the increment with an atomic, lockless method.
As with all things performance related, it’s important to profile before jumping to the conclusion that you should optimize everything in your path. If your profiling shows that locking is causing a high level of waiting in your application, then it’s time to consider lighter alternatives such as Interlocked.
CompareExchange() – Exchange existing value if equal some value
So let’s look at how we could use CompareExchange() to solve our problem above. The general syntax of CompareExchange() is:
T CompareExchange<T>(ref T location, T newValue, T expectedValue)
If the value in location == expectedValue, then newValue is exchanged. Either way, the value in location (before exchange) is returned.
Actually, CompareExchange() is not one method, but a family of overloaded methods that can take int, long, float, double, pointers, or references. It cannot take other value types (that is, can’t CompareExchange() two DateTime instances directly). Also keep in mind that the version that takes any reference type (the generic overload) only checks for reference equality, it does not call any overridden Equals().
So how does this help us? Well, we can grab the current total, and exchange the new value if total hasn’t changed. This would look like this:
1: // grab the snapshot
2: double current = total;
3:
4: // if the total hasn’t changed since I grabbed the snapshot, then
5: // set it to the new total
6: Interlocked.CompareExchange(ref total, current + next, current);
So what the code above says is: if the amount in total (1st arg) is the same as the amount in current (3rd arg), then set total to current + next (2nd arg). This check and exchange pair is atomic (and thus thread-safe).
This works if total is the same as our snapshot in current, but the problem, is what happens if they aren’t the same? Well, we know that in either case we will get the previous value of total (before the exchange), back as a result. Thus, we can test this against our snapshot to see if it was the value we expected:
1: // if the value returned is != current, then our snapshot must be out of date
2: // which means we didn't (and shouldn't) apply current + next
3: if (Interlocked.CompareExchange(ref total, current + next, current) != current)
4: {
5: // ooops, total was not equal to our snapshot in current, what should we do???
6: }
So what do we do if we fail? That’s up to you and the problem you are trying to solve. It’s possible you would decide to abort the whole transaction, or perhaps do a lightweight spin and try again. Let’s try that:
1: double current = total;
2:
3: // make first attempt...
4: if (Interlocked.CompareExchange(ref total, current + i, current) != current)
5: {
6: // if we fail, go into a spin wait, spin, and try again until succeed
7: var spinner = new SpinWait();
8:
9: do
10: {
11: spinner.SpinOnce();
12: current = total;
13: }
14: while (Interlocked.CompareExchange(ref total, current + i, current) != current);
15: }
16:
This is not trivial code, but it illustrates a possible use of CompareExchange(). What we are doing is first checking to see if we succeed on the first try, and if so great! If not, we create a SpinWait and then repeat the process of SpinOnce(), grab a fresh snapshot, and repeat until CompareExchnage() succeeds. You may wonder why not a simple do-while here, and the reason it’s more efficient to only create the SpinWait until we absolutely know we need one, for optimal efficiency.
Though not as simple (or maintainable) as a simple lock, this will perform better in many situations. Comparing an unlocked (and wrong) version, a version using lock, and the Interlocked of the code, we get the following average times for multiple iterations of adding the sum of 100,000 numbers:
1: Unlocked money average time: 2.1 ms
2: Locked money average time: 5.1 ms
3: Interlocked money average time: 3 ms
So the Interlocked.CompareExchange(), while heavier to code, came in lighter than the lock, offering a good compromise of safety and performance when we need to reduce contention.
CompareExchange() - it’s not just for adding stuff…
So that was one simple use of CompareExchange() in the context of adding double values -- which meant we couldn’t have used the simpler Interlocked.Add() -- but it has other uses as well.
If you think about it, this really works anytime you want to create something new based on a current value without using a full lock. For example, you could use it to create a simple lazy instantiation implementation. In this case, we want to set the lazy instance only if the previous value was null:
1: public static class Lazy<T> where T : class, new()
2: {
3: private static T _instance;
4:
5: public static T Instance
6: {
7: get
8: {
9: // if current is null, we need to create new instance
10: if (_instance == null)
11: {
12: // attempt create, it will only set if previous was null
13: Interlocked.CompareExchange(ref _instance, new T(), (T)null);
14: }
15:
16: return _instance;
17: }
18: }
19: }
So, if _instance == null, this will create a new T() and attempt to exchange it with _instance. If _instance is not null, then it does nothing and we discard the new T() we created.
This is a way to create lazy instances of a type where we are more concerned about locking overhead than creating an accidental duplicate which is not used. In fact, the BCL implementation of Lazy<T> offers a similar thread-safety choice for Publication thread safety, where it will not guarantee only one instance was created, but it will guarantee that all readers get the same instance.
Another possible use would be in concurrent collections. Let’s say, for example, that you are creating your own brand new super stack that uses a linked list paradigm and is “lock free”. We could use Interlocked.CompareExchange() to be able to do a lockless Push() which could be more efficient in multi-threaded applications where several threads are pushing and popping on the stack concurrently.
Yes, there are already concurrent collections in the BCL (in .NET 4.0 as part of the TPL), but it’s a fun exercise! So let’s assume we have a node like this:
1: public sealed class Node<T>
2: {
3: // the data for this node
4: public T Data { get; set; }
5:
6: // the link to the next instance
7: internal Node<T> Next { get; set; }
8: }
Then, perhaps, our stack’s Push() operation might look something like:
1: public sealed class SuperStack<T>
2: {
3: private volatile T _head;
4:
5: public void Push(T value)
6: {
7: var newNode = new Node<int> { Data = value, Next = _head };
8:
9: if (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next)
10: {
11: var spinner = new SpinWait();
12:
13: do
14: {
15: spinner.SpinOnce();
16: newNode.Next = _head;
17: }
18: while (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next);
19: }
20: }
21:
22: // ...
23: }
Notice a similar paradigm here as with adding our doubles before. What we are doing is creating the new Node with the data to push, and with a Next value being the original node referenced by _head. This will create our stack behavior (LIFO – Last In, First Out). Now, we have to set _head to now refer to the newNode, but we must first make sure it hasn’t changed!
So we check to see if _head has the same value we saved in our snapshot as newNode.Next, and if so, we set _head to newNode. This is all done atomically, and the result is _head’s original value, as long as the original value was what we assumed it was with newNode.Next, then we are good and we set it without a lock! If not, we SpinWait and try again.
Once again, this is much lighter than locking in highly parallelized code with lots of contention. If I compare the method above with a similar class using lock, I get the following results for pushing 100,000 items:
1: Locked SuperStack average time: 6 ms
2: Interlocked SuperStack average time: 4.5 ms
So, once again, we can get more efficient than a lock, though there is the cost of added code complexity. Fortunately for you, most of the concurrent collection you’d ever need are already created for you in the System.Collections.Concurrent (here) namespace – for more information, see my Little Wonders – The Concurent Collections Part 1 (here), Part 2 (here), and Part 3 (here).
Summary
We’ve seen before how the Interlocked class can be used to safely and efficiently add, increment, decrement, read, and exchange values in a multi-threaded environment. In addition to these, Interlocked CompareExchange() can be used to perform more complex logic without the need of a lock when lock contention is a concern.
The added efficiency, though, comes at the cost of more complex code. As such, the standard lock is often sufficient for most thread-safety needs. But if profiling indicates you spend a lot of time waiting for locks, or if you just need a lock for something simple such as an increment, decrement, read, exchange, etc., then consider using the Interlocked class’s methods to reduce wait.
Technorati Tags: C#,CSharp,.NET,Little Wonders,Interlocked,CompareExchange,threading,concurrency