Inside the Concurrent Collections: ConcurrentDictionary

Posted by Simon Cooper on Simple Talk See other posts from Simple Talk or by Simon Cooper
Published on Wed, 22 Feb 2012 18:08:00 GMT Indexed on 2012/03/18 18:16 UTC
Read the original article Hit count: 465

Using locks to implement a thread-safe collection is rather like using a sledgehammer - unsubtle, easy to understand, and tends to make any other tool redundant. Unlike the previous two collections I looked at, ConcurrentStack and ConcurrentQueue, ConcurrentDictionary uses locks quite heavily. However, it is careful to wield locks only where necessary to ensure that concurrency is maximised.

This will, by necessity, be a higher-level look than my other posts in this series, as there is quite a lot of code and logic in ConcurrentDictionary. Therefore, I do recommend that you have ConcurrentDictionary open in a decompiler to have a look at all the details that I skip over.

The problem with locks

There's several things to bear in mind when using locks, as encapsulated by the lock keyword in C# and the System.Threading.Monitor class in .NET (if you're unsure as to what lock does in C#, I briefly covered it in my first post in the series):

  1. Locks block threads
    The most obvious problem is that threads waiting on a lock can't do any work at all. No preparatory work, no 'optimistic' work like in ConcurrentQueue and ConcurrentStack, nothing. It sits there, waiting to be unblocked. This is bad if you're trying to maximise concurrency.
  2. Locks are slow
    Whereas most of the methods on the Interlocked class can be compiled down to a single CPU instruction, ensuring atomicity at the hardware level, taking out a lock requires some heavy lifting by the CLR and the operating system. There's quite a bit of work required to take out a lock, block other threads, and wake them up again. If locks are used heavily, this impacts performance.
  3. Deadlocks
    When using locks there's always the possibility of a deadlock - two threads, each holding a lock, each trying to aquire the other's lock. Fortunately, this can be avoided with careful programming and structured lock-taking, as we'll see.

So, it's important to minimise where locks are used to maximise the concurrency and performance of the collection.

Implementation

As you might expect, ConcurrentDictionary is similar in basic implementation to the non-concurrent Dictionary, which I studied in a previous post. I'll be using some concepts introduced there, so I recommend you have a quick read of it.

So, if you were implementing a thread-safe dictionary, what would you do? The naive implementation is to simply have a single lock around all methods accessing the dictionary. This would work, but doesn't allow much concurrency.

Fortunately, the bucketing used by Dictionary allows a simple but effective improvement to this - one lock per bucket. This allows different threads modifying different buckets to do so in parallel. Any thread making changes to the contents of a bucket takes the lock for that bucket, ensuring those changes are thread-safe. The method that maps each bucket to a lock is the GetBucketAndLockNo method:

private void GetBucketAndLockNo(
        int hashcode, out int bucketNo, out int lockNo, int bucketCount) {
        
    // the bucket number is the hashcode (without the initial sign bit)
    // modulo the number of buckets
    bucketNo = (hashcode & 0x7fffffff) % bucketCount;
	  
    // and the lock number is the bucket number modulo the number of locks
    lockNo = bucketNo % m_locks.Length;
}

However, this does require some changes to how the buckets are implemented. The 'implicit' linked list within a single backing array used by the non-concurrent Dictionary adds a dependency between separate buckets, as every bucket uses the same backing array. Instead, ConcurrentDictionary uses a strict linked list on each bucket:

This ensures that each bucket is entirely separate from all other buckets; adding or removing an item from a bucket is independent to any changes to other buckets.

Modifying the dictionary

All the operations on the dictionary follow the same basic pattern:

  void AlterBucket(TKey key, ...) {
      int bucketNo, lockNo;
1:    GetBucketAndLockNo(
          key.GetHashCode(), out bucketNo, out lockNo, m_buckets.Length);

2:    lock (m_locks[lockNo]) {
3:        Node headNode = m_buckets[bucketNo];
        
4:        Mutate the node linked list as appropriate
      }
  }

For example, when adding another entry to the dictionary, you would iterate through the linked list to check whether the key exists already, and add the new entry as the head node. When removing items, you would find the entry to remove (if it exists), and remove the node from the linked list. Adding, updating, and removing items all follow this pattern.

Performance issues

There is a problem we have to address at this point. If the number of buckets in the dictionary is fixed in the constructor, then the performance will degrade from O(1) to O(n) when a large number of items are added to the dictionary. As more and more items get added to the linked lists in each bucket, the lookup operations will spend most of their time traversing a linear linked list.

To fix this, the buckets array has to be resized once the number of items in each bucket has gone over a certain limit. (In ConcurrentDictionary this limit is when the size of the largest bucket is greater than the number of buckets for each lock. This check is done at the end of the TryAddInternal method.)

Resizing the bucket array and re-hashing everything affects every bucket in the collection. Therefore, this operation needs to take out every lock in the collection. Taking out mutiple locks at once inevitably summons the spectre of the deadlock; two threads each hold a lock, and each trying to acquire the other lock.

How can we eliminate this? Simple - ensure that threads never try to 'swap' locks in this fashion. When taking out multiple locks, always take them out in the same order, and always take out all the locks you need before starting to release them. In ConcurrentDictionary, this is controlled by the AcquireLocks, AcquireAllLocks and ReleaseLocks methods. Locks are always taken out and released in the order they are in the m_locks array, and locks are all released right at the end of the method in a finally block.

At this point, it's worth pointing out that the locks array is never re-assigned, even when the buckets array is increased in size. The number of locks is fixed in the constructor by the concurrencyLevel parameter. This simplifies programming the locks; you don't have to check if the locks array has changed or been re-assigned before taking out a lock object. And you can be sure that when a thread takes out a lock, another thread isn't going to re-assign the lock array. This would create a new series of lock objects, thus allowing another thread to ignore the existing locks (and any threads controlling them), breaking thread-safety.

Consequences of growing the array

Just because we're using locks doesn't mean that race conditions aren't a problem. We can see this by looking at the GrowTable method. The operation of this method can be boiled down to:

  private void GrowTable(Node[] buckets) {
      try {
1:        Acquire first lock in the locks array
          // this causes any other thread trying to take out
          // all the locks to block because the first lock in the array
          // is always the one taken out first
        
          // check if another thread has already resized the buckets array
          // while we were waiting to acquire the first lock   
2:        if (buckets != m_buckets) return;
        
3:        Calculate the new size of the backing array

4:        Node[] array = new array[size];

5:        Acquire all the remaining locks

6:        Re-hash the contents of the existing buckets into array

7:        m_buckets = array;
      }
     finally {
8:        Release all locks
      }
  }

As you can see, there's already a check for a race condition at step 2, for the case when the GrowTable method is called twice in quick succession on two separate threads. One will successfully resize the buckets array (blocking the second in the meantime), when the second thread is unblocked it'll see that the array has already been resized & exit without doing anything.

There is another case we need to consider; looking back at the AlterBucket method above, consider the following situation:

  1. Thread 1 calls AlterBucket; step 1 is executed to get the bucket and lock numbers.
  2. Thread 2 calls GrowTable and executes steps 1-5; thread 1 is blocked when it tries to take out the lock in step 2.
  3. Thread 2 re-hashes everything, re-assigns the buckets array, and releases all the locks (steps 6-8).
  4. Thread 1 is unblocked and continues executing, but the calculated bucket and lock numbers are no longer valid.

Between calculating the correct bucket and lock number and taking out the lock, another thread has changed where everything is. Not exactly thread-safe. Well, a similar problem was solved in ConcurrentStack and ConcurrentQueue by storing a local copy of the state, doing the necessary calculations, then checking if that state is still valid. We can use a similar idea here:

void AlterBucket(TKey key, ...) {
    while (true) {
        Node[] buckets = m_buckets;
        
        int bucketNo, lockNo;
        GetBucketAndLockNo(
            key.GetHashCode(), out bucketNo, out lockNo, buckets.Length);

        lock (m_locks[lockNo]) {
            // if the state has changed, go back to the start
            if (buckets != m_buckets) continue;
            
            Node headNode = m_buckets[bucketNo];
            
            Mutate the node linked list as appropriate
        }
        break;
    }
}

TryGetValue and GetEnumerator

And so, finally, we get onto TryGetValue and GetEnumerator. I've left these to the end because, well, they don't actually use any locks.

How can this be? Whenever you change a bucket, you need to take out the corresponding lock, yes? Indeed you do. However, it is important to note that TryGetValue and GetEnumerator don't actually change anything. Just as immutable objects are, by definition, thread-safe, read-only operations don't need to take out a lock because they don't change anything. All lockless methods can happily iterate through the buckets and linked lists without worrying about locking anything.

However, this does put restrictions on how the other methods operate. Because there could be another thread in the middle of reading the dictionary at any time (even if a lock is taken out), the dictionary has to be in a valid state at all times. Every change to state has to be made visible to other threads in a single atomic operation (all relevant variables are marked volatile to help with this).

This restriction ensures that whatever the reading threads are doing, they never read the dictionary in an invalid state (eg items that should be in the collection temporarily removed from the linked list, or reading a node that has had it's key & value removed before the node itself has been removed from the linked list).

Fortunately, all the operations needed to change the dictionary can be done in that way. Bucket resizes are made visible when the new array is assigned back to the m_buckets variable. Any additions or modifications to a node are done by creating a new node, then splicing it into the existing list using a single variable assignment. Node removals are simply done by re-assigning the node's m_next pointer.

Because the dictionary can be changed by another thread during execution of the lockless methods, the GetEnumerator method is liable to return dirty reads - changes made to the dictionary after GetEnumerator was called, but before the enumeration got to that point in the dictionary. It's worth listing at this point which methods are lockless, and which take out all the locks in the dictionary to ensure they get a consistent view of the dictionary:

Lockless:
  • TryGetValue
  • GetEnumerator
  • The indexer getter
  • ContainsKey
Takes out every lock (lockfull?):
  • Count
  • IsEmpty
  • Keys
  • Values
  • CopyTo
  • ToArray

Concurrent principles

That covers the overall implementation of ConcurrentDictionary. I haven't even begun to scratch the surface of this sophisticated collection. That I leave to you. However, we've looked at enough to be able to extract some useful principles for concurrent programming:

  1. Partitioning

    When using locks, the work is partitioned into independant chunks, each with its own lock. Each partition can then be modified concurrently to other partitions.

  2. Ordered lock-taking

    When a method does need to control the entire collection, locks are taken and released in a fixed order to prevent deadlocks.

  3. Lockless reads

    Read operations that don't care about dirty reads don't take out any lock; the rest of the collection is implemented so that any reading thread always has a consistent view of the collection.

That leads us to the final collection in this little series - ConcurrentBag. Lacking a non-concurrent analogy, it is quite different to any other collection in the class libraries. Prepare your thinking hats!

© Simple Talk or respective owner

Related posts about Inside the Concurrent Collections