Unlike the other concurrent collections, ConcurrentBag does not really have a non-concurrent analogy. As stated in the MSDN documentation, ConcurrentBag is optimised for the situation where the same thread is both producing and consuming items from the collection. We'll see how this is the case as we take a closer look. Again, I recommend you have ConcurrentBag open in a decompiler for reference.
Thread Statics
ConcurrentBag makes heavy use of thread statics - static variables marked with ThreadStaticAttribute. This is a special attribute that instructs the CLR to scope any values assigned to or read from the variable to the executing thread, not globally within the AppDomain.
This means that if two different threads assign two different values to the same thread static variable, one value will not overwrite the other, and each thread will see the value they assigned to the variable, separately to any other thread. This is a very useful function that allows for ConcurrentBag's concurrency properties.
You can think of a thread static variable:
[ThreadStatic]
private static int m_Value;
as doing the same as:
private static Dictionary<Thread, int> m_Values;
where the executing thread's identity is used to automatically set and retrieve the corresponding value in the dictionary.
In .NET 4, this usage of ThreadStaticAttribute is encapsulated in the ThreadLocal class.
Lists of lists
ConcurrentBag, at its core, operates as a linked list of linked lists:
Each outer list node is an instance of ThreadLocalList, and each inner list node is an instance of Node. Each outer ThreadLocalList is owned by a particular thread, accessible through the thread local m_locals variable:
private ThreadLocal<ThreadLocalList<T>> m_locals
It is important to note that, although the m_locals variable is thread-local, that only applies to accesses through that variable. The objects referenced by the thread (each instance of the ThreadLocalList object) are normal heap objects that are not specific to any thread. Thinking back to the Dictionary analogy above, if each value stored in the dictionary could be accessed by other means, then any thread could access the value belonging to other threads using that mechanism. Only reads and writes to the variable defined as thread-local are re-routed by the CLR according to the executing thread's identity.
So, although m_locals is defined as thread-local, the m_headList, m_nextList and m_tailList variables aren't. This means that any thread can access all the thread local lists in the collection by doing a linear search through the outer linked list defined by these variables.
Adding items
So, onto the collection operations. First, adding items. This one's pretty simple. If the current thread doesn't already own an instance of ThreadLocalList, then one is created (or, if there are lists owned by threads that have stopped, it takes control of one of those). Then the item is added to the head of that thread's list.
That's it. Don't worry, it'll get more complicated when we account for the other operations on the list!
Taking & Peeking items
This is where it gets tricky. If the current thread's list has items in it, then it peeks or removes the head item (not the tail item) from the local list and returns that. However, if the local list is empty, it has to go and steal another item from another list, belonging to a different thread. It iterates through all the thread local lists in the collection using the m_headList and m_nextList variables until it finds one that has items in it, and it steals one item from that list. Up to this point, the two threads had been operating completely independently. To steal an item from another thread's list, the stealing thread has to do it in such a way as to not step on the owning thread's toes.
Recall how adding and removing items both operate on the head of the thread's linked list? That gives us an easy way out - a thread trying to steal items from another thread can pop in round the back of another thread's list using the m_tail variable, and steal an item from the back without the owning thread knowing anything about it. The owning thread can carry on completely independently, unaware that one of its items has been nicked.
However, this only works when there are at least 3 items in the list, as that guarantees there will be at least one node between the owning thread performing operations on the list head and the thread stealing items from the tail - there's no chance of the two threads operating on the same node at the same time and causing a race condition.
If there's less than three items in the list, then there does need to be some synchronization between the two threads. In this case, the lock on the ThreadLocalList object is used to mediate access to a thread's list when there's the possibility of contention.
Thread synchronization
In ConcurrentBag, this is done using several mechanisms:
Operations performed by the owner thread only take out the lock when there are less than three items in the collection. With three or greater items, there won't be any conflict with a stealing thread operating on the tail of the list.
If a lock isn't taken out, the owning thread sets the list's m_currentOp variable to a non-zero value for the duration of the operation. This indicates to all other threads that there is a non-locked operation currently occuring on that list.
The stealing thread always takes out the lock, to prevent two threads trying to steal from the same list at the same time.
After taking out the lock, the stealing thread spinwaits until m_currentOp has been set to zero before actually performing the steal. This ensures there won't be a conflict with the owning thread when the number of items in the list is on the 2-3 item borderline. If any add or remove operations are started in the meantime, and the list is below 3 items, those operations try to take out the list's lock and are blocked until the stealing thread has finished.
This allows a thread to steal an item from another thread's list without corrupting it. What about synchronization in the collection as a whole?
Collection synchronization
Any thread that operates on the collection's global structure (accessing anything outside the thread local lists) has to take out the collection's global lock - m_globalListsLock. This single lock is sufficient when adding a new thread local list, as the items inside each thread's list are unaffected. However, what about operations (such as Count or ToArray) that need to access every item in the collection?
In order to ensure a consistent view, all operations on the collection are stopped while the count or ToArray is performed. This is done by freezing the bag at the start, performing the global operation, and unfreezing at the end:
The global lock is taken out, to prevent structural alterations to the collection.
m_needSync is set to true. This notifies all the threads that they need to take out their list's lock irregardless of what operation they're doing.
All the list locks are taken out in order. This blocks all locking operations on the lists.
The freezing thread waits for all current lockless operations to finish by spinwaiting on each m_currentOp field.
The global operation can then be performed while the bag is frozen, but no other operations can take place at the same time, as all other threads are blocked on a list's lock. Then, once the global operation has finished, the locks are released, m_needSync is unset, and normal concurrent operation resumes.
Concurrent principles
That's the essence of how ConcurrentBag operates. Each thread operates independently on its own local list, except when they have to steal items from another list. When stealing, only the stealing thread is forced to take out the lock; the owning thread only has to when there is the possibility of contention. And a global lock controls accesses to the structure of the collection outside the thread lists. Operations affecting the entire collection take out all locks in the collection to freeze the contents at a single point in time.
So, what principles can we extract here?
Threads operate independently
Thread-static variables and ThreadLocal makes this easy. Threads operate entirely concurrently on their own structures; only when they need to grab data from another thread is there any thread contention.
Minimised lock-taking
Even when two threads need to operate on the same data structures (one thread stealing from another), they do so in such a way such that the probability of actually blocking on a lock is minimised; the owning thread always operates on the head of the list, and the stealing thread always operates on the tail.
Management of lockless operations
Any operations that don't take out a lock still have a 'hook' to force them to lock when necessary. This allows all operations on the collection to be stopped temporarily while a global snapshot is taken. Hopefully, such operations will be short-lived and infrequent.
That's all the concurrent collections covered. I hope you've found it as informative and interesting as I have. Next, I'll be taking a closer look at ThreadLocal, which I came across while analyzing ConcurrentBag. As you'll see, the operation of this class deserves a much closer look.