Search Results

Search found 7 results on 1 pages for 'lockless'.

Page 1/1 | 1

Lockless queue implementation ends up having a loop under stress

- by Fozi

I have lockless queues written in C in form of a linked list that contains requests from several threads posted to and handled in a single thread. After a few hours of stress I end up having the last request's next pointer pointing to itself, which creates an endless loop and locks up the handling thread. The application runs (and fails) on both Linux and Windows. I'm debugging on Windows, where my COMPARE_EXCHANGE_PTR maps to InterlockedCompareExchangePointer. This is the code that pushes a request to the head of the list, and is called from several threads: void push_request(struct request * volatile * root, struct request * request) { assert(request); do { request->next = *root; } while(COMPARE_EXCHANGE_PTR(root, request, request->next) != request->next); } This is the code that gets a request from the end of the list, and is only called by a single thread that is handling them: struct request * pop_request(struct request * volatile * root) { struct request * volatile * p; struct request * request; do { p = root; while(*p && (*p)->next) p = &(*p)->next; // <- loops here request = *p; } while(COMPARE_EXCHANGE_PTR(p, NULL, request) != request); assert(request->next == NULL); return request; } Note that I'm not using a tail pointer because I wanted to avoid the complication of having to deal with the tail pointer in push_request. However I suspect that the problem might be in the way I find the end of the list. There are several places that push a request into the queue, but they all look generaly like this: // device->requests is defined as struct request * volatile requests; struct request * request = malloc(sizeof(struct request)); if(request) { // fill out request fields push_request(&device->requests, request); sem_post(device->request_sem); } The code that handles the request is doing more than that, but in essence does this in a loop: if(sem_wait_timeout(device->request_sem, timeout) == sem_success) { struct request * request = pop_request(&device->requests); // handle request free(request); } I also just added a function that is checking the list for duplicates before and after each operation, but I'm afraid that this check will change the timing so that I will never encounter the point where it fails. (I'm waiting for it to break as I'm writing this.) When I break the hanging program the handler thread loops in pop_request at the marked position. I have a valid list of one or more requests and the last one's next pointer points to itself. The request queues are usually short, I've never seen more then 10, and only 1 and 3 the two times I could take a look at this failure in the debugger. I thought this through as much as I could and I came to the conclusion that I should never be able to end up with a loop in my list unless I push the same request twice. I'm quite sure that this never happens. I'm also fairly sure (although not completely) that it's not the ABA problem. I know that I might pop more than one request at the same time, but I believe this is irrelevant here, and I've never seen it happening. (I'll fix this as well) I thought long and hard about how I can break my function, but I don't see a way to end up with a loop. So the question is: Can someone see a way how this can break? Can someone prove that this can not? Eventually I will solve this (maybe by using a tail pointer or some other solution - locking would be a problem because the threads that post should not be locked, I do have a RW lock at hand though) but I would like to make sure that changing the list actually solves my problem (as opposed to makes it just less likely because of different timing).

Read the article
Is there such a thing as a lockless queue for multiple read or write threads?

- by acidzombie24

I was thinking, is it possible to have a lockless queue when more then one thread is reading or writing? I seen an implementation with a lockless queue that worked with one read and one write thread but never more then one for either. Is it possible? I don't think it is. Can/does anyone want to prove it?

Read the article
Inside the Concurrent Collections: ConcurrentDictionary

- by Simon Cooper

Using locks to implement a thread-safe collection is rather like using a sledgehammer - unsubtle, easy to understand, and tends to make any other tool redundant. Unlike the previous two collections I looked at, ConcurrentStack and ConcurrentQueue, ConcurrentDictionary uses locks quite heavily. However, it is careful to wield locks only where necessary to ensure that concurrency is maximised. This will, by necessity, be a higher-level look than my other posts in this series, as there is quite a lot of code and logic in ConcurrentDictionary. Therefore, I do recommend that you have ConcurrentDictionary open in a decompiler to have a look at all the details that I skip over. The problem with locks There's several things to bear in mind when using locks, as encapsulated by the lock keyword in C# and the System.Threading.Monitor class in .NET (if you're unsure as to what lock does in C#, I briefly covered it in my first post in the series): Locks block threads The most obvious problem is that threads waiting on a lock can't do any work at all. No preparatory work, no 'optimistic' work like in ConcurrentQueue and ConcurrentStack, nothing. It sits there, waiting to be unblocked. This is bad if you're trying to maximise concurrency. Locks are slow Whereas most of the methods on the Interlocked class can be compiled down to a single CPU instruction, ensuring atomicity at the hardware level, taking out a lock requires some heavy lifting by the CLR and the operating system. There's quite a bit of work required to take out a lock, block other threads, and wake them up again. If locks are used heavily, this impacts performance. Deadlocks When using locks there's always the possibility of a deadlock - two threads, each holding a lock, each trying to aquire the other's lock. Fortunately, this can be avoided with careful programming and structured lock-taking, as we'll see. So, it's important to minimise where locks are used to maximise the concurrency and performance of the collection. Implementation As you might expect, ConcurrentDictionary is similar in basic implementation to the non-concurrent Dictionary, which I studied in a previous post. I'll be using some concepts introduced there, so I recommend you have a quick read of it. So, if you were implementing a thread-safe dictionary, what would you do? The naive implementation is to simply have a single lock around all methods accessing the dictionary. This would work, but doesn't allow much concurrency. Fortunately, the bucketing used by Dictionary allows a simple but effective improvement to this - one lock per bucket. This allows different threads modifying different buckets to do so in parallel. Any thread making changes to the contents of a bucket takes the lock for that bucket, ensuring those changes are thread-safe. The method that maps each bucket to a lock is the GetBucketAndLockNo method: private void GetBucketAndLockNo( int hashcode, out int bucketNo, out int lockNo, int bucketCount) { // the bucket number is the hashcode (without the initial sign bit) // modulo the number of buckets bucketNo = (hashcode & 0x7fffffff) % bucketCount; // and the lock number is the bucket number modulo the number of locks lockNo = bucketNo % m_locks.Length; } However, this does require some changes to how the buckets are implemented. The 'implicit' linked list within a single backing array used by the non-concurrent Dictionary adds a dependency between separate buckets, as every bucket uses the same backing array. Instead, ConcurrentDictionary uses a strict linked list on each bucket: This ensures that each bucket is entirely separate from all other buckets; adding or removing an item from a bucket is independent to any changes to other buckets. Modifying the dictionary All the operations on the dictionary follow the same basic pattern: void AlterBucket(TKey key, ...) { int bucketNo, lockNo; 1: GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, m_buckets.Length); 2: lock (m_locks[lockNo]) { 3: Node headNode = m_buckets[bucketNo]; 4: Mutate the node linked list as appropriate } } For example, when adding another entry to the dictionary, you would iterate through the linked list to check whether the key exists already, and add the new entry as the head node. When removing items, you would find the entry to remove (if it exists), and remove the node from the linked list. Adding, updating, and removing items all follow this pattern. Performance issues There is a problem we have to address at this point. If the number of buckets in the dictionary is fixed in the constructor, then the performance will degrade from O(1) to O(n) when a large number of items are added to the dictionary. As more and more items get added to the linked lists in each bucket, the lookup operations will spend most of their time traversing a linear linked list. To fix this, the buckets array has to be resized once the number of items in each bucket has gone over a certain limit. (In ConcurrentDictionary this limit is when the size of the largest bucket is greater than the number of buckets for each lock. This check is done at the end of the TryAddInternal method.) Resizing the bucket array and re-hashing everything affects every bucket in the collection. Therefore, this operation needs to take out every lock in the collection. Taking out mutiple locks at once inevitably summons the spectre of the deadlock; two threads each hold a lock, and each trying to acquire the other lock. How can we eliminate this? Simple - ensure that threads never try to 'swap' locks in this fashion. When taking out multiple locks, always take them out in the same order, and always take out all the locks you need before starting to release them. In ConcurrentDictionary, this is controlled by the AcquireLocks, AcquireAllLocks and ReleaseLocks methods. Locks are always taken out and released in the order they are in the m_locks array, and locks are all released right at the end of the method in a finally block. At this point, it's worth pointing out that the locks array is never re-assigned, even when the buckets array is increased in size. The number of locks is fixed in the constructor by the concurrencyLevel parameter. This simplifies programming the locks; you don't have to check if the locks array has changed or been re-assigned before taking out a lock object. And you can be sure that when a thread takes out a lock, another thread isn't going to re-assign the lock array. This would create a new series of lock objects, thus allowing another thread to ignore the existing locks (and any threads controlling them), breaking thread-safety. Consequences of growing the array Just because we're using locks doesn't mean that race conditions aren't a problem. We can see this by looking at the GrowTable method. The operation of this method can be boiled down to: private void GrowTable(Node[] buckets) { try { 1: Acquire first lock in the locks array // this causes any other thread trying to take out // all the locks to block because the first lock in the array // is always the one taken out first // check if another thread has already resized the buckets array // while we were waiting to acquire the first lock 2: if (buckets != m_buckets) return; 3: Calculate the new size of the backing array 4: Node[] array = new array[size]; 5: Acquire all the remaining locks 6: Re-hash the contents of the existing buckets into array 7: m_buckets = array; } finally { 8: Release all locks } } As you can see, there's already a check for a race condition at step 2, for the case when the GrowTable method is called twice in quick succession on two separate threads. One will successfully resize the buckets array (blocking the second in the meantime), when the second thread is unblocked it'll see that the array has already been resized & exit without doing anything. There is another case we need to consider; looking back at the AlterBucket method above, consider the following situation: Thread 1 calls AlterBucket; step 1 is executed to get the bucket and lock numbers. Thread 2 calls GrowTable and executes steps 1-5; thread 1 is blocked when it tries to take out the lock in step 2. Thread 2 re-hashes everything, re-assigns the buckets array, and releases all the locks (steps 6-8). Thread 1 is unblocked and continues executing, but the calculated bucket and lock numbers are no longer valid. Between calculating the correct bucket and lock number and taking out the lock, another thread has changed where everything is. Not exactly thread-safe. Well, a similar problem was solved in ConcurrentStack and ConcurrentQueue by storing a local copy of the state, doing the necessary calculations, then checking if that state is still valid. We can use a similar idea here: void AlterBucket(TKey key, ...) { while (true) { Node[] buckets = m_buckets; int bucketNo, lockNo; GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, buckets.Length); lock (m_locks[lockNo]) { // if the state has changed, go back to the start if (buckets != m_buckets) continue; Node headNode = m_buckets[bucketNo]; Mutate the node linked list as appropriate } break; } } TryGetValue and GetEnumerator And so, finally, we get onto TryGetValue and GetEnumerator. I've left these to the end because, well, they don't actually use any locks. How can this be? Whenever you change a bucket, you need to take out the corresponding lock, yes? Indeed you do. However, it is important to note that TryGetValue and GetEnumerator don't actually change anything. Just as immutable objects are, by definition, thread-safe, read-only operations don't need to take out a lock because they don't change anything. All lockless methods can happily iterate through the buckets and linked lists without worrying about locking anything. However, this does put restrictions on how the other methods operate. Because there could be another thread in the middle of reading the dictionary at any time (even if a lock is taken out), the dictionary has to be in a valid state at all times. Every change to state has to be made visible to other threads in a single atomic operation (all relevant variables are marked volatile to help with this). This restriction ensures that whatever the reading threads are doing, they never read the dictionary in an invalid state (eg items that should be in the collection temporarily removed from the linked list, or reading a node that has had it's key & value removed before the node itself has been removed from the linked list). Fortunately, all the operations needed to change the dictionary can be done in that way. Bucket resizes are made visible when the new array is assigned back to the m_buckets variable. Any additions or modifications to a node are done by creating a new node, then splicing it into the existing list using a single variable assignment. Node removals are simply done by re-assigning the node's m_next pointer. Because the dictionary can be changed by another thread during execution of the lockless methods, the GetEnumerator method is liable to return dirty reads - changes made to the dictionary after GetEnumerator was called, but before the enumeration got to that point in the dictionary. It's worth listing at this point which methods are lockless, and which take out all the locks in the dictionary to ensure they get a consistent view of the dictionary: Lockless: TryGetValue GetEnumerator The indexer getter ContainsKey Takes out every lock (lockfull?): Count IsEmpty Keys Values CopyTo ToArray Concurrent principles That covers the overall implementation of ConcurrentDictionary. I haven't even begun to scratch the surface of this sophisticated collection. That I leave to you. However, we've looked at enough to be able to extract some useful principles for concurrent programming: Partitioning When using locks, the work is partitioned into independant chunks, each with its own lock. Each partition can then be modified concurrently to other partitions. Ordered lock-taking When a method does need to control the entire collection, locks are taken and released in a fixed order to prevent deadlocks. Lockless reads Read operations that don't care about dirty reads don't take out any lock; the rest of the collection is implemented so that any reading thread always has a consistent view of the collection. That leads us to the final collection in this little series - ConcurrentBag. Lacking a non-concurrent analogy, it is quite different to any other collection in the class libraries. Prepare your thinking hats!

Read the article
Inside the Concurrent Collections: ConcurrentBag

- by Simon Cooper

Unlike the other concurrent collections, ConcurrentBag does not really have a non-concurrent analogy. As stated in the MSDN documentation, ConcurrentBag is optimised for the situation where the same thread is both producing and consuming items from the collection. We'll see how this is the case as we take a closer look. Again, I recommend you have ConcurrentBag open in a decompiler for reference. Thread Statics ConcurrentBag makes heavy use of thread statics - static variables marked with ThreadStaticAttribute. This is a special attribute that instructs the CLR to scope any values assigned to or read from the variable to the executing thread, not globally within the AppDomain. This means that if two different threads assign two different values to the same thread static variable, one value will not overwrite the other, and each thread will see the value they assigned to the variable, separately to any other thread. This is a very useful function that allows for ConcurrentBag's concurrency properties. You can think of a thread static variable: [ThreadStatic] private static int m_Value; as doing the same as: private static Dictionary<Thread, int> m_Values; where the executing thread's identity is used to automatically set and retrieve the corresponding value in the dictionary. In .NET 4, this usage of ThreadStaticAttribute is encapsulated in the ThreadLocal class. Lists of lists ConcurrentBag, at its core, operates as a linked list of linked lists: Each outer list node is an instance of ThreadLocalList, and each inner list node is an instance of Node. Each outer ThreadLocalList is owned by a particular thread, accessible through the thread local m_locals variable: private ThreadLocal<ThreadLocalList<T>> m_locals It is important to note that, although the m_locals variable is thread-local, that only applies to accesses through that variable. The objects referenced by the thread (each instance of the ThreadLocalList object) are normal heap objects that are not specific to any thread. Thinking back to the Dictionary analogy above, if each value stored in the dictionary could be accessed by other means, then any thread could access the value belonging to other threads using that mechanism. Only reads and writes to the variable defined as thread-local are re-routed by the CLR according to the executing thread's identity. So, although m_locals is defined as thread-local, the m_headList, m_nextList and m_tailList variables aren't. This means that any thread can access all the thread local lists in the collection by doing a linear search through the outer linked list defined by these variables. Adding items So, onto the collection operations. First, adding items. This one's pretty simple. If the current thread doesn't already own an instance of ThreadLocalList, then one is created (or, if there are lists owned by threads that have stopped, it takes control of one of those). Then the item is added to the head of that thread's list. That's it. Don't worry, it'll get more complicated when we account for the other operations on the list! Taking & Peeking items This is where it gets tricky. If the current thread's list has items in it, then it peeks or removes the head item (not the tail item) from the local list and returns that. However, if the local list is empty, it has to go and steal another item from another list, belonging to a different thread. It iterates through all the thread local lists in the collection using the m_headList and m_nextList variables until it finds one that has items in it, and it steals one item from that list. Up to this point, the two threads had been operating completely independently. To steal an item from another thread's list, the stealing thread has to do it in such a way as to not step on the owning thread's toes. Recall how adding and removing items both operate on the head of the thread's linked list? That gives us an easy way out - a thread trying to steal items from another thread can pop in round the back of another thread's list using the m_tail variable, and steal an item from the back without the owning thread knowing anything about it. The owning thread can carry on completely independently, unaware that one of its items has been nicked. However, this only works when there are at least 3 items in the list, as that guarantees there will be at least one node between the owning thread performing operations on the list head and the thread stealing items from the tail - there's no chance of the two threads operating on the same node at the same time and causing a race condition. If there's less than three items in the list, then there does need to be some synchronization between the two threads. In this case, the lock on the ThreadLocalList object is used to mediate access to a thread's list when there's the possibility of contention. Thread synchronization In ConcurrentBag, this is done using several mechanisms: Operations performed by the owner thread only take out the lock when there are less than three items in the collection. With three or greater items, there won't be any conflict with a stealing thread operating on the tail of the list. If a lock isn't taken out, the owning thread sets the list's m_currentOp variable to a non-zero value for the duration of the operation. This indicates to all other threads that there is a non-locked operation currently occuring on that list. The stealing thread always takes out the lock, to prevent two threads trying to steal from the same list at the same time. After taking out the lock, the stealing thread spinwaits until m_currentOp has been set to zero before actually performing the steal. This ensures there won't be a conflict with the owning thread when the number of items in the list is on the 2-3 item borderline. If any add or remove operations are started in the meantime, and the list is below 3 items, those operations try to take out the list's lock and are blocked until the stealing thread has finished. This allows a thread to steal an item from another thread's list without corrupting it. What about synchronization in the collection as a whole? Collection synchronization Any thread that operates on the collection's global structure (accessing anything outside the thread local lists) has to take out the collection's global lock - m_globalListsLock. This single lock is sufficient when adding a new thread local list, as the items inside each thread's list are unaffected. However, what about operations (such as Count or ToArray) that need to access every item in the collection? In order to ensure a consistent view, all operations on the collection are stopped while the count or ToArray is performed. This is done by freezing the bag at the start, performing the global operation, and unfreezing at the end: The global lock is taken out, to prevent structural alterations to the collection. m_needSync is set to true. This notifies all the threads that they need to take out their list's lock irregardless of what operation they're doing. All the list locks are taken out in order. This blocks all locking operations on the lists. The freezing thread waits for all current lockless operations to finish by spinwaiting on each m_currentOp field. The global operation can then be performed while the bag is frozen, but no other operations can take place at the same time, as all other threads are blocked on a list's lock. Then, once the global operation has finished, the locks are released, m_needSync is unset, and normal concurrent operation resumes. Concurrent principles That's the essence of how ConcurrentBag operates. Each thread operates independently on its own local list, except when they have to steal items from another list. When stealing, only the stealing thread is forced to take out the lock; the owning thread only has to when there is the possibility of contention. And a global lock controls accesses to the structure of the collection outside the thread lists. Operations affecting the entire collection take out all locks in the collection to freeze the contents at a single point in time. So, what principles can we extract here? Threads operate independently Thread-static variables and ThreadLocal makes this easy. Threads operate entirely concurrently on their own structures; only when they need to grab data from another thread is there any thread contention. Minimised lock-taking Even when two threads need to operate on the same data structures (one thread stealing from another), they do so in such a way such that the probability of actually blocking on a lock is minimised; the owning thread always operates on the head of the list, and the stealing thread always operates on the tail. Management of lockless operations Any operations that don't take out a lock still have a 'hook' to force them to lock when necessary. This allows all operations on the collection to be stopped temporarily while a global snapshot is taken. Hopefully, such operations will be short-lived and infrequent. That's all the concurrent collections covered. I hope you've found it as informative and interesting as I have. Next, I'll be taking a closer look at ThreadLocal, which I came across while analyzing ConcurrentBag. As you'll see, the operation of this class deserves a much closer look.

Read the article
C#/.NET Little Wonders: Interlocked CompareExchange()

- by James Michael Hare

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Two posts ago, I discussed the Interlocked Add(), Increment(), and Decrement() methods (here) for adding and subtracting values in a thread-safe, lightweight manner. Then, last post I talked about the Interlocked Read() and Exchange() methods (here) for safely and efficiently reading and setting 32 or 64 bit values (or references). This week, we’ll round out the discussion by talking about the Interlocked CompareExchange() method and how it can be put to use to exchange a value if the current value is what you expected it to be. Dirty reads can lead to bad results Many of the uses of Interlocked that we’ve explored so far have centered around either reading, setting, or adding values. But what happens if you want to do something more complex such as setting a value based on the previous value in some manner? Perhaps you were creating an application that reads a current balance, applies a deposit, and then saves the new modified balance, where of course you’d want that to happen atomically. If you read the balance, then go to save the new balance and between that time the previous balance has already changed, you’ll have an issue! Think about it, if we read the current balance as $400, and we are applying a new deposit of $50.75, but meanwhile someone else deposits $200 and sets the total to $600, but then we write a total of $450.75 we’ve lost $200! Now, certainly for int and long values we can use Interlocked.Add() to handles these cases, and it works well for that. But what if we want to work with doubles, for example? Let’s say we wanted to add the numbers from 0 to 99,999 in parallel. We could do this by spawning several parallel tasks to continuously add to a total: 1: double total = 0; 2: 3: Parallel.For(0, 10000, next => 4: { 5: total += next; 6: }); Were this run on one thread using a standard for loop, we’d expect an answer of 4,999,950,000 (the sum of all numbers from 0 to 99,999). But when we run this in parallel as written above, we’ll likely get something far off. The result of one of my runs, for example, was 1,281,880,740. That is way off! If this were banking software we’d be in big trouble with our clients. So what happened? The += operator is not atomic, it will read in the current value, add the result, then store it back into the total. At any point in all of this another thread could read a “dirty” current total and accidentally “skip” our add. So, to clean this up, we could use a lock to guarantee concurrency: 1: double total = 0.0; 2: object locker = new object(); 3: 4: Parallel.For(0, count, next => 5: { 6: lock (locker) 7: { 8: total += next; 9: } 10: }); Which will give us the correct result of 4,999,950,000. One thing to note is that locking can be heavy, especially if the operation being locked over is trivial, or the life of the lock is a high percentage of the work being performed concurrently. In the case above, the lock consumes pretty much all of the time of each parallel task – and the task being locked on is relatively trivial. Now, let me put in a disclaimer here before we go further: For most uses, lock is more than sufficient for your needs, and is often the simplest solution! So, if lock is sufficient for most needs, why would we ever consider another solution? The problem with locking is that it can suspend execution of your thread while it waits for the signal that the lock is free. Moreover, if the operation being locked over is trivial, the lock can add a very high level of overhead. This is why things like Interlocked.Increment() perform so well, instead of locking just to perform an increment, we perform the increment with an atomic, lockless method. As with all things performance related, it’s important to profile before jumping to the conclusion that you should optimize everything in your path. If your profiling shows that locking is causing a high level of waiting in your application, then it’s time to consider lighter alternatives such as Interlocked. CompareExchange() – Exchange existing value if equal some value So let’s look at how we could use CompareExchange() to solve our problem above. The general syntax of CompareExchange() is: T CompareExchange<T>(ref T location, T newValue, T expectedValue) If the value in location == expectedValue, then newValue is exchanged. Either way, the value in location (before exchange) is returned. Actually, CompareExchange() is not one method, but a family of overloaded methods that can take int, long, float, double, pointers, or references. It cannot take other value types (that is, can’t CompareExchange() two DateTime instances directly). Also keep in mind that the version that takes any reference type (the generic overload) only checks for reference equality, it does not call any overridden Equals(). So how does this help us? Well, we can grab the current total, and exchange the new value if total hasn’t changed. This would look like this: 1: // grab the snapshot 2: double current = total; 3: 4: // if the total hasn’t changed since I grabbed the snapshot, then 5: // set it to the new total 6: Interlocked.CompareExchange(ref total, current + next, current); So what the code above says is: if the amount in total (1st arg) is the same as the amount in current (3rd arg), then set total to current + next (2nd arg). This check and exchange pair is atomic (and thus thread-safe). This works if total is the same as our snapshot in current, but the problem, is what happens if they aren’t the same? Well, we know that in either case we will get the previous value of total (before the exchange), back as a result. Thus, we can test this against our snapshot to see if it was the value we expected: 1: // if the value returned is != current, then our snapshot must be out of date 2: // which means we didn't (and shouldn't) apply current + next 3: if (Interlocked.CompareExchange(ref total, current + next, current) != current) 4: { 5: // ooops, total was not equal to our snapshot in current, what should we do??? 6: } So what do we do if we fail? That’s up to you and the problem you are trying to solve. It’s possible you would decide to abort the whole transaction, or perhaps do a lightweight spin and try again. Let’s try that: 1: double current = total; 2: 3: // make first attempt... 4: if (Interlocked.CompareExchange(ref total, current + i, current) != current) 5: { 6: // if we fail, go into a spin wait, spin, and try again until succeed 7: var spinner = new SpinWait(); 8: 9: do 10: { 11: spinner.SpinOnce(); 12: current = total; 13: } 14: while (Interlocked.CompareExchange(ref total, current + i, current) != current); 15: } 16: This is not trivial code, but it illustrates a possible use of CompareExchange(). What we are doing is first checking to see if we succeed on the first try, and if so great! If not, we create a SpinWait and then repeat the process of SpinOnce(), grab a fresh snapshot, and repeat until CompareExchnage() succeeds. You may wonder why not a simple do-while here, and the reason it’s more efficient to only create the SpinWait until we absolutely know we need one, for optimal efficiency. Though not as simple (or maintainable) as a simple lock, this will perform better in many situations. Comparing an unlocked (and wrong) version, a version using lock, and the Interlocked of the code, we get the following average times for multiple iterations of adding the sum of 100,000 numbers: 1: Unlocked money average time: 2.1 ms 2: Locked money average time: 5.1 ms 3: Interlocked money average time: 3 ms So the Interlocked.CompareExchange(), while heavier to code, came in lighter than the lock, offering a good compromise of safety and performance when we need to reduce contention. CompareExchange() - it’s not just for adding stuff… So that was one simple use of CompareExchange() in the context of adding double values -- which meant we couldn’t have used the simpler Interlocked.Add() -- but it has other uses as well. If you think about it, this really works anytime you want to create something new based on a current value without using a full lock. For example, you could use it to create a simple lazy instantiation implementation. In this case, we want to set the lazy instance only if the previous value was null: 1: public static class Lazy<T> where T : class, new() 2: { 3: private static T _instance; 4: 5: public static T Instance 6: { 7: get 8: { 9: // if current is null, we need to create new instance 10: if (_instance == null) 11: { 12: // attempt create, it will only set if previous was null 13: Interlocked.CompareExchange(ref _instance, new T(), (T)null); 14: } 15: 16: return _instance; 17: } 18: } 19: } So, if _instance == null, this will create a new T() and attempt to exchange it with _instance. If _instance is not null, then it does nothing and we discard the new T() we created. This is a way to create lazy instances of a type where we are more concerned about locking overhead than creating an accidental duplicate which is not used. In fact, the BCL implementation of Lazy<T> offers a similar thread-safety choice for Publication thread safety, where it will not guarantee only one instance was created, but it will guarantee that all readers get the same instance. Another possible use would be in concurrent collections. Let’s say, for example, that you are creating your own brand new super stack that uses a linked list paradigm and is “lock free”. We could use Interlocked.CompareExchange() to be able to do a lockless Push() which could be more efficient in multi-threaded applications where several threads are pushing and popping on the stack concurrently. Yes, there are already concurrent collections in the BCL (in .NET 4.0 as part of the TPL), but it’s a fun exercise! So let’s assume we have a node like this: 1: public sealed class Node<T> 2: { 3: // the data for this node 4: public T Data { get; set; } 5: 6: // the link to the next instance 7: internal Node<T> Next { get; set; } 8: } Then, perhaps, our stack’s Push() operation might look something like: 1: public sealed class SuperStack<T> 2: { 3: private volatile T _head; 4: 5: public void Push(T value) 6: { 7: var newNode = new Node<int> { Data = value, Next = _head }; 8: 9: if (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next) 10: { 11: var spinner = new SpinWait(); 12: 13: do 14: { 15: spinner.SpinOnce(); 16: newNode.Next = _head; 17: } 18: while (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next); 19: } 20: } 21: 22: // ... 23: } Notice a similar paradigm here as with adding our doubles before. What we are doing is creating the new Node with the data to push, and with a Next value being the original node referenced by _head. This will create our stack behavior (LIFO – Last In, First Out). Now, we have to set _head to now refer to the newNode, but we must first make sure it hasn’t changed! So we check to see if _head has the same value we saved in our snapshot as newNode.Next, and if so, we set _head to newNode. This is all done atomically, and the result is _head’s original value, as long as the original value was what we assumed it was with newNode.Next, then we are good and we set it without a lock! If not, we SpinWait and try again. Once again, this is much lighter than locking in highly parallelized code with lots of contention. If I compare the method above with a similar class using lock, I get the following results for pushing 100,000 items: 1: Locked SuperStack average time: 6 ms 2: Interlocked SuperStack average time: 4.5 ms So, once again, we can get more efficient than a lock, though there is the cost of added code complexity. Fortunately for you, most of the concurrent collection you’d ever need are already created for you in the System.Collections.Concurrent (here) namespace – for more information, see my Little Wonders – The Concurent Collections Part 1 (here), Part 2 (here), and Part 3 (here). Summary We’ve seen before how the Interlocked class can be used to safely and efficiently add, increment, decrement, read, and exchange values in a multi-threaded environment. In addition to these, Interlocked CompareExchange() can be used to perform more complex logic without the need of a lock when lock contention is a concern. The added efficiency, though, comes at the cost of more complex code. As such, the standard lock is often sufficient for most thread-safety needs. But if profiling indicates you spend a lot of time waiting for locks, or if you just need a lock for something simple such as an increment, decrement, read, exchange, etc., then consider using the Interlocked class’s methods to reduce wait. Technorati Tags: C#,CSharp,.NET,Little Wonders,Interlocked,CompareExchange,threading,concurrency

Read the article
Solve a maze using multicores?

- by acidzombie24

This question is messy, i dont need a working solution, i need some psuedo code. How would i solve this maze? This is a homework question. I have to get from point green to red. At every fork i need to 'spawn a thread' and go that direction. I need to figure out how to get to red but i am unsure how to avoid paths i already have taken (finishing with any path is ok, i am just not allowed to go in circles). Heres an example of my problem, i start by moving down and i see a fork so one goes right and one goes down (or this thread can take it, it doesnt matter). Now lets ignore the rest of the forks and say the one going right hits the wall, goes down, hits the wall and goes left, then goes up. The other thread goes down, hits the wall then goes all the way right. The bottom path has been taken twice, by starting at different sides. How do i mark this path has been taken? Do i need a lock? Is this the only way? Is there a lockless solution? Implementation wise i was thinking i could have the maze something like this. I dont like the solution because there is a LOT of locking (assuming i lock before each read and write of the haveTraverse member). I dont need to use the MazeSegment class below, i just wrote it up as an example. I am allowed to construct the maze however i want. I was thinking maybe the solution requires no connecting paths and thats hassling me. Maybe i could split the map up instead of using the format below (which is easy to read and understand). But if i knew how to split it up i would know how to walk it thus the problem. How do i walk this maze efficiently? The only hint i receive was dont try to conserve memory by reusing it, make copies. However that was related to a problem with ordering a list and i dont think the hint was a hint for this. class MazeSegment { enum Direction { up, down, left, right} List<Pair<Direction, MazeSegment*>> ConnectingPaths; int line_length; bool haveTraverse; } MazeSegment root; class MazeSegment { enum Direction { up, down, left, right} List<Pair<Direction, MazeSegment*>> ConnectingPaths; bool haveTraverse; } void WalkPath(MazeSegment segment) { if(segment.haveTraverse) return; segment.haveTraverse = true; foreach(var v in segment) { if(v.haveTraverse == false) spawn_thread(v); } } WalkPath(root);

Read the article
C#/.NET Little Wonders: The ConcurrentDictionary

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. In this series of posts, we will discuss how the concurrent collections have been developed to help alleviate these multi-threading concerns. Last week’s post began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>. Today's post discusses the ConcurrentDictionary<T> (originally I had intended to discuss ConcurrentBag this week as well, but ConcurrentDictionary had enough information to create a very full post on its own!). Finally next week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see the index here. Recap As you'll recall from the previous post, the original collections were object-based containers that accomplished synchronization through a Synchronized member. While these were convenient because you didn't have to worry about writing your own synchronization logic, they were a bit too finely grained and if you needed to perform multiple operations under one lock, the automatic synchronization didn't buy much. With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization. This cuts both ways in that you have a lot more control as a developer over when and how fine-grained you want to synchronize, but on the other hand if you just want simple synchronization it creates more work. With .NET 4.0, we get the best of both worlds in generic collections. A new breed of collections was born called the concurrent collections in the System.Collections.Concurrent namespace. These amazing collections are fine-tuned to have best overall performance for situations requiring concurrent access. They are not meant to replace the generic collections, but to simply be an alternative to creating your own locking mechanisms. Among those concurrent collections were the ConcurrentStack<T> and ConcurrentQueue<T> which provide classic LIFO and FIFO collections with a concurrent twist. As we saw, some of the traditional methods that required calls to be made in a certain order (like checking for not IsEmpty before calling Pop()) were replaced in favor of an umbrella operation that combined both under one lock (like TryPop()). Now, let's take a look at the next in our series of concurrent collections!For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentDictionary – the fully thread-safe dictionary The ConcurrentDictionary<TKey,TValue> is the thread-safe counterpart to the generic Dictionary<TKey, TValue> collection. Obviously, both are designed for quick – O(1) – lookups of data based on a key. If you think of algorithms where you need lightning fast lookups of data and don’t care whether the data is maintained in any particular ordering or not, the unsorted dictionaries are generally the best way to go. Note: as a side note, there are sorted implementations of IDictionary, namely SortedDictionary and SortedList which are stored as an ordered tree and a ordered list respectively. While these are not as fast as the non-sorted dictionaries – they are O(log2 n) – they are a great combination of both speed and ordering -- and still greatly outperform a linear search. Now, once again keep in mind that if all you need to do is load a collection once and then allow multi-threaded reading you do not need any locking. Examples of this tend to be situations where you load a lookup or translation table once at program start, then keep it in memory for read-only reference. In such cases locking is completely non-productive. However, most of the time when we need a concurrent dictionary we are interleaving both reads and updates. This is where the ConcurrentDictionary really shines! It achieves its thread-safety with no common lock to improve efficiency. It actually uses a series of locks to provide concurrent updates, and has lockless reads! This means that the ConcurrentDictionary gets even more efficient the higher the ratio of reads-to-writes you have. ConcurrentDictionary and Dictionary differences For the most part, the ConcurrentDictionary<TKey,TValue> behaves like it’s Dictionary<TKey,TValue> counterpart with a few differences. Some notable examples of which are: Add() does not exist in the concurrent dictionary. This means you must use TryAdd(), AddOrUpdate(), or GetOrAdd(). It also means that you can’t use a collection initializer with the concurrent dictionary. TryAdd() replaced Add() to attempt atomic, safe adds. Because Add() only succeeds if the item doesn’t already exist, we need an atomic operation to check if the item exists, and if not add it while still under an atomic lock. TryUpdate() was added to attempt atomic, safe updates. If we want to update an item, we must make sure it exists first and that the original value is what we expected it to be. If all these are true, we can update the item under one atomic step. TryRemove() was added to attempt atomic, safe removes. To safely attempt to remove a value we need to see if the key exists first, this checks for existence and removes under an atomic lock. AddOrUpdate() was added to attempt an thread-safe “upsert”. There are many times where you want to insert into a dictionary if the key doesn’t exist, or update the value if it does. This allows you to make a thread-safe add-or-update. GetOrAdd() was added to attempt an thread-safe query/insert. Sometimes, you want to query for whether an item exists in the cache, and if it doesn’t insert a starting value for it. This allows you to get the value if it exists and insert if not. Count, Keys, Values properties take a snapshot of the dictionary. Accessing these properties may interfere with add and update performance and should be used with caution. ToArray() returns a static snapshot of the dictionary. That is, the dictionary is locked, and then copied to an array as a O(n) operation. GetEnumerator() is thread-safe and efficient, but allows dirty reads. Because reads require no locking, you can safely iterate over the contents of the dictionary. The only downside is that, depending on timing, you may get dirty reads. Dirty reads during iteration The last point on GetEnumerator() bears some explanation. Picture a scenario in which you call GetEnumerator() (or iterate using a foreach, etc.) and then, during that iteration the dictionary gets updated. This may not sound like a big deal, but it can lead to inconsistent results if used incorrectly. The problem is that items you already iterated over that are updated a split second after don’t show the update, but items that you iterate over that were updated a split second before do show the update. Thus you may get a combination of items that are “stale” because you iterated before the update, and “fresh” because they were updated after GetEnumerator() but before the iteration reached them. Let’s illustrate with an example, let’s say you load up a concurrent dictionary like this: 1: // load up a dictionary. 2: var dictionary = new ConcurrentDictionary<string, int>(); 3: 4: dictionary["A"] = 1; 5: dictionary["B"] = 2; 6: dictionary["C"] = 3; 7: dictionary["D"] = 4; 8: dictionary["E"] = 5; 9: dictionary["F"] = 6; Then you have one task (using the wonderful TPL!) to iterate using dirty reads: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); And one task to attempt updates in a separate thread (probably): 1: // attempt updates in a separate thread 2: var updateTask = new Task(() => 3: { 4: // iterates, and updates the value by one 5: foreach (var pair in dictionary) 6: { 7: dictionary[pair.Key] = pair.Value + 1; 8: } 9: }); Now that we’ve done this, we can fire up both tasks and wait for them to complete: 1: // start both tasks 2: updateTask.Start(); 3: iterationTask.Start(); 4: 5: // wait for both to complete. 6: Task.WaitAll(updateTask, iterationTask); Now, if I you didn’t know about the dirty reads, you may have expected to see the iteration before the updates (such as A:1, B:2, C:3, D:4, E:5, F:6). However, because the reads are dirty, we will quite possibly get a combination of some updated, some original. My own run netted this result: 1: F:6 2: E:6 3: D:5 4: C:4 5: B:3 6: A:2 Note that, of course, iteration is not in order because ConcurrentDictionary, like Dictionary, is unordered. Also note that both E and F show the value 6. This is because the output task reached F before the update, but the updates for the rest of the items occurred before their output (probably because console output is very slow, comparatively). If we want to always guarantee that we will get a consistent snapshot to iterate over (that is, at the point we ask for it we see precisely what is in the dictionary and no subsequent updates during iteration), we should iterate over a call to ToArray() instead: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary.ToArray()) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); The atomic Try…() methods As you can imagine TryAdd() and TryRemove() have few surprises. Both first check the existence of the item to determine if it can be added or removed based on whether or not the key currently exists in the dictionary: 1: // try add attempts an add and returns false if it already exists 2: if (dictionary.TryAdd("G", 7)) 3: Console.WriteLine("G did not exist, now inserted with 7"); 4: else 5: Console.WriteLine("G already existed, insert failed."); TryRemove() also has the virtue of returning the value portion of the removed entry matching the given key: 1: // attempt to remove the value, if it exists it is removed and the original is returned 2: int removedValue; 3: if (dictionary.TryRemove("C", out removedValue)) 4: Console.WriteLine("Removed C and its value was " + removedValue); 5: else 6: Console.WriteLine("C did not exist, remove failed."); Now TryUpdate() is an interesting creature. You might think from it’s name that TryUpdate() first checks for an item’s existence, and then updates if the item exists, otherwise it returns false. Well, note quite... It turns out when you call TryUpdate() on a concurrent dictionary, you pass it not only the new value you want it to have, but also the value you expected it to have before the update. If the item exists in the dictionary, and it has the value you expected, it will update it to the new value atomically and return true. If the item is not in the dictionary or does not have the value you expected, it is not modified and false is returned. 1: // attempt to update the value, if it exists and if it has the expected original value 2: if (dictionary.TryUpdate("G", 42, 7)) 3: Console.WriteLine("G existed and was 7, now it's 42."); 4: else 5: Console.WriteLine("G either didn't exist, or wasn't 7."); The composite Add methods The ConcurrentDictionary also has composite add methods that can be used to perform updates and gets, with an add if the item is not existing at the time of the update or get. The first of these, AddOrUpdate(), allows you to add a new item to the dictionary if it doesn’t exist, or update the existing item if it does. For example, let’s say you are creating a dictionary of counts of stock ticker symbols you’ve subscribed to from a market data feed: 1: public sealed class SubscriptionManager 2: { 3: private readonly ConcurrentDictionary<string, int> _subscriptions = new ConcurrentDictionary<string, int>(); 4: 5: // adds a new subscription, or increments the count of the existing one. 6: public void AddSubscription(string tickerKey) 7: { 8: // add a new subscription with count of 1, or update existing count by 1 if exists 9: var resultCount = _subscriptions.AddOrUpdate(tickerKey, 1, (symbol, count) => count + 1); 10: 11: // now check the result to see if we just incremented the count, or inserted first count 12: if (resultCount == 1) 13: { 14: // subscribe to symbol... 15: } 16: } 17: } Notice the update value factory Func delegate. If the key does not exist in the dictionary, the add value is used (in this case 1 representing the first subscription for this symbol), but if the key already exists, it passes the key and current value to the update delegate which computes the new value to be stored in the dictionary. The return result of this operation is the value used (in our case: 1 if added, existing value + 1 if updated). Likewise, the GetOrAdd() allows you to attempt to retrieve a value from the dictionary, and if the value does not currently exist in the dictionary it will insert a value. This can be handy in cases where perhaps you wish to cache data, and thus you would query the cache to see if the item exists, and if it doesn’t you would put the item into the cache for the first time: 1: public sealed class PriceCache 2: { 3: private readonly ConcurrentDictionary<string, double> _cache = new ConcurrentDictionary<string, double>(); 4: 5: // adds a new subscription, or increments the count of the existing one. 6: public double QueryPrice(string tickerKey) 7: { 8: // check for the price in the cache, if it doesn't exist it will call the delegate to create value. 9: return _cache.GetOrAdd(tickerKey, symbol => GetCurrentPrice(symbol)); 10: } 11: 12: private double GetCurrentPrice(string tickerKey) 13: { 14: // do code to calculate actual true price. 15: } 16: } There are other variations of these two methods which vary whether a value is provided or a factory delegate, but otherwise they work much the same. Oddities with the composite Add methods The AddOrUpdate() and GetOrAdd() methods are totally thread-safe, on this you may rely, but they are not atomic. It is important to note that the methods that use delegates execute those delegates outside of the lock. This was done intentionally so that a user delegate (of which the ConcurrentDictionary has no control of course) does not take too long and lock out other threads. This is not necessarily an issue, per se, but it is something you must consider in your design. The main thing to consider is that your delegate may get called to generate an item, but that item may not be the one returned! Consider this scenario: A calls GetOrAdd and sees that the key does not currently exist, so it calls the delegate. Now thread B also calls GetOrAdd and also sees that the key does not currently exist, and for whatever reason in this race condition it’s delegate completes first and it adds its new value to the dictionary. Now A is done and goes to get the lock, and now sees that the item now exists. In this case even though it called the delegate to create the item, it will pitch it because an item arrived between the time it attempted to create one and it attempted to add it. Let’s illustrate, assume this totally contrived example program which has a dictionary of char to int. And in this dictionary we want to store a char and it’s ordinal (that is, A = 1, B = 2, etc). So for our value generator, we will simply increment the previous value in a thread-safe way (perhaps using Interlocked): 1: public static class Program 2: { 3: private static int _nextNumber = 0; 4: 5: // the holder of the char to ordinal 6: private static ConcurrentDictionary<char, int> _dictionary 7: = new ConcurrentDictionary<char, int>(); 8: 9: // get the next id value 10: public static int NextId 11: { 12: get { return Interlocked.Increment(ref _nextNumber); } 13: } Then, we add a method that will perform our insert: 1: public static void Inserter() 2: { 3: for (int i = 0; i < 26; i++) 4: { 5: _dictionary.GetOrAdd((char)('A' + i), key => NextId); 6: } 7: } Finally, we run our test by starting two tasks to do this work and get the results… 1: public static void Main() 2: { 3: // 3 tasks attempting to get/insert 4: var tasks = new List<Task> 5: { 6: new Task(Inserter), 7: new Task(Inserter) 8: }; 9: 10: tasks.ForEach(t => t.Start()); 11: Task.WaitAll(tasks.ToArray()); 12: 13: foreach (var pair in _dictionary.OrderBy(p => p.Key)) 14: { 15: Console.WriteLine(pair.Key + ":" + pair.Value); 16: } 17: } If you run this with only one task, you get the expected A:1, B:2, ..., Z:26. But running this in parallel you will get something a bit more complex. My run netted these results: 1: A:1 2: B:3 3: C:4 4: D:5 5: E:6 6: F:7 7: G:8 8: H:9 9: I:10 10: J:11 11: K:12 12: L:13 13: M:14 14: N:15 15: O:16 16: P:17 17: Q:18 18: R:19 19: S:20 20: T:21 21: U:22 22: V:23 23: W:24 24: X:25 25: Y:26 26: Z:27 Notice that B is 3? This is most likely because both threads attempted to call GetOrAdd() at roughly the same time and both saw that B did not exist, thus they both called the generator and one thread got back 2 and the other got back 3. However, only one of those threads can get the lock at a time for the actual insert, and thus the one that generated the 3 won and the 3 was inserted and the 2 got discarded. This is why on these methods your factory delegates should be careful not to have any logic that would be unsafe if the value they generate will be pitched in favor of another item generated at roughly the same time. As such, it is probably a good idea to keep those generators as stateless as possible. Summary The ConcurrentDictionary is a very efficient and thread-safe version of the Dictionary generic collection. It has all the benefits of type-safety that it’s generic collection counterpart does, and in addition is extremely efficient especially when there are more reads than writes concurrently. Tweet Technorati Tags: C#, .NET, Concurrent Collections, Collections, Little Wonders, Black Rabbit Coder,James Michael Hare

Read the article

1