Search Results

Search found 35 results on 2 pages for 'atomicity'.

Page 2/2 | < Previous Page | 1 2 

  • Can I specify the files to commit in subversion in a file rather than on the command line?

    - by René Nyffenegger
    I have renamed (with svn move) a lot of files in a subversion project. Now, I am trying to commit these on Window's cmd.exe. It seems that I hit a limit (probably by cmd.exe) in that the number of files is too long for the command line to swallow. Now, I thought and hoped that I could list the files to commit in a seperate file that I could specify with the commit command (something like svn ci --files-to-commit=renamed-files.txt -m "Renamed a lot of files" Yet, either such an option does not exist or I am unable to find this. Unfortunately, I cannot do a svn ci . as I have done other changes in the project as well. Neither can I do a svn ci *pattern-of-renamed-files* since this would only check in the added files, not the deleted ones. Before I start checking in the files with smaller chunks of files to check in (and thus increase the revision number uneccesserily without giving a hint as to the 'atomicity' of the operation) I thought I ask if this is indeed impossible to do.

    Read the article

  • Returning two or more values from a function

    - by cf_PhillipSenn
    I need to return multiple values from a ColdFusion function in an ajax callback function. Here's what I've got: $('input[name="StateName"]').live('change', function() { var StateID = $(this).parents('tr').attr('id'); var StateName = $(this).val(); $.ajax({ url: 'Remote/State.cfc' ,type: "POST" ,data: { 'method': 'UpdateStateName' ,'StateID': StateID ,'StateName': StateName } ,success: function(result){ if (isNaN(result)) { $('#msg').text(result).addClass('err'); } else { $('#' + result + ' input[name="StateName"]').addClass('changed'); }; } ,error: function(msg){ $('#msg').text('Connection error').addClass('err'); } }); }); If I trap a database error, then the success callback is fired, and the result is Not a Number (It is in fact, the text of the error message). I need the function to also pass back other values. One might be the primary key of the row that caused the error. Another might be the old StateName, so that I can refresh the old value on the screen so that the client will know absolutely for sure that their change did not take effect. I guess I'm breaking the rule of atomicity here and need to fix that, because I'm using result as both the primary key of the row that was updated, or it's the error message if the update fails. I need to return both the primary key and the error message.

    Read the article

  • what's a good technique for building and running many similar unit tests?

    - by jcollum
    I have a test setup where I have many very similar unit tests that I need to run. For example, there are about 40 stored procedures that need to be checked for existence in the target environment. However I'd like all the tests to be grouped by their business unit. So there'd be 40 instances of a very similar TestMethod in 40 separate classes. Kinda lame. One other thing: each group of tests need to be in their own solution. So Business Unit A will have a solution called Tests.BusinessUnitA. I'm thinking that I can set this all up by passing a configuration object (with the name of the stored proc to check, among other things) to a TestRunner class. The problem is that I'm losing the atomicity of my unit tests. I wouldn't be able to run just one of the tests, I'd have to run all the tests in the TestRunner class. This is what the code looks like at this time. Sure, it's nice and compact, but if Test 8 fails, I have no way of running just Test 8. TestRunner runner = new TestRunner(config, this.TestContext); var runnerType = typeof(TestRunner); var methods = runnerType.GetMethods() .Where(x => x.GetCustomAttributes(typeof(TestMethodAttribute), false) .Count() > 0).ToArray(); foreach (var method in methods) { method.Invoke(runner, null); } So I'm looking for suggestions for making a group of unit tests that take in a configuration object but won't require me to generate many many TestMethods. This looks like it might require code-generation, but I'd like to solve it without that.

    Read the article

  • What is the best way to create a running integer id on the AppEngine data storage?

    - by Freed
    For various reasons, I need a unique running integer id for my entities stored on the Google AppEngine. The automatically generated key sort of has this behaviour, but it doesn't start from 1 (or 0) and doesn't guarantee that the generated integer part will come from a continuous sequence. What would be the best way to efficiently implement this on AppEngine? Is there any support from the storage system? To add to the complexity, I might need to do this over entities from different entity groups, meaning I can't just get the highest id right now and save an entity with the next id in a transaction. Might memcache be the way to go..? Edit: I havn't yet implemented this, but to clarify on the memcache idea. I know memcache is unreliable, but in practice it probably won't lose data "too often" to hurt performance. Basically, I would have a memcache entry for the last used id, update it (somehow atomically) whenever I create a new entity and use that id. In the case of memcache not having a value for this entry, I'd get the highest id so far by doing a query on my entities sorted by the id and update memcache (unless someone else had already done so). The only problem I can see with this right now would be atomicity of the operation as a whole if the save of my new entity was also part of a transaction. Thoughts..?

    Read the article

  • Trouble understanding the semantics of volatile in Java

    - by HungryTux
    I've been reading up about the use of volatile variables in Java. I understand that they ensure instant visibility of their latest updates to all the threads running in the system on different cores/processors. However no atomicity of the operations that caused these updates is ensured. I see the following literature being used frequently A write to a volatile field happens-before every read of that same field . This is where I am a little confused. Here's a snippet of code which should help me better explain my query. volatile int x = 0; volatile int y = 0; Thread-0: | Thread-1: | if (x==1) { | if (y==1) { return false; | return false; } else { | } else { y=1; | x=1; return true; | return true; } | } Since x & y are both volatile, we have the following happens-before edges between the write of y in Thread-0 and read of y in Thread-1 between the write of x in Thread-1 and read of x in Thread-0 Does this imply that, at any point of time, only one of the threads can be in its 'else' block(since a write would happen before the read)? It may well be possible that Thread-0 starts, loads x, finds it value as 0 and right before it is about to write y in the else-block, there's a context switch to Thread-1 which loads y finds it value as 0 and thus enters the else-block too. Does volatile guard against such context switches (seems very unlikely)?

    Read the article

  • Big Data – Buzz Words: Importance of Relational Database in Big Data World – Day 9 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned what is HDFS. In this article we will take a quick look at the importance of the Relational Database in Big Data world. A Big Question? Here are a few questions I often received since the beginning of the Big Data Series - Does the relational database have no space in the story of the Big Data? Does relational database is no longer relevant as Big Data is evolving? Is relational database not capable to handle Big Data? Is it true that one no longer has to learn about relational data if Big Data is the final destination? Well, every single time when I hear that one person wants to learn about Big Data and is no longer interested in learning about relational database, I find it as a bit far stretched. I am not here to give ambiguous answers of It Depends. I am personally very clear that one who is aspiring to become Big Data Scientist or Big Data Expert they should learn about relational database. NoSQL Movement The reason for the NoSQL Movement in recent time was because of the two important advantages of the NoSQL databases. Performance Flexible Schema In personal experience I have found that when I use NoSQL I have found both of the above listed advantages when I use NoSQL database. There are instances when I found relational database too much restrictive when my data is unstructured as well as they have in the datatype which my Relational Database does not support. It is the same case when I have found that NoSQL solution performing much better than relational databases. I must say that I am a big fan of NoSQL solutions in the recent times but I have also seen occasions and situations where relational database is still perfect fit even though the database is growing increasingly as well have all the symptoms of the big data. Situations in Relational Database Outperforms Adhoc reporting is the one of the most common scenarios where NoSQL is does not have optimal solution. For example reporting queries often needs to aggregate based on the columns which are not indexed as well are built while the report is running, in this kind of scenario NoSQL databases (document database stores, distributed key value stores) database often does not perform well. In the case of the ad-hoc reporting I have often found it is much easier to work with relational databases. SQL is the most popular computer language of all the time. I have been using it for almost over 10 years and I feel that I will be using it for a long time in future. There are plenty of the tools, connectors and awareness of the SQL language in the industry. Pretty much every programming language has a written drivers for the SQL language and most of the developers have learned this language during their school/college time. In many cases, writing query based on SQL is much easier than writing queries in NoSQL supported languages. I believe this is the current situation but in the future this situation can reverse when No SQL query languages are equally popular. ACID (Atomicity Consistency Isolation Durability) – Not all the NoSQL solutions offers ACID compliant language. There are always situations (for example banking transactions, eCommerce shopping carts etc.) where if there is no ACID the operations can be invalid as well database integrity can be at risk. Even though the data volume indeed qualify as a Big Data there are always operations in the application which absolutely needs ACID compliance matured language. The Mixed Bag I have often heard argument that all the big social media sites now a days have moved away from Relational Database. Actually this is not entirely true. While researching about Big Data and Relational Database, I have found that many of the popular social media sites uses Big Data solutions along with Relational Database. Many are using relational databases to deliver the results to end user on the run time and many still uses a relational database as their major backbone. Here are a few examples: Facebook uses MySQL to display the timeline. (Reference Link) Twitter uses MySQL. (Reference Link) Tumblr uses Sharded MySQL (Reference Link) Wikipedia uses MySQL for data storage. (Reference Link) There are many for prominent organizations which are running large scale applications uses relational database along with various Big Data frameworks to satisfy their various business needs. Summary I believe that RDBMS is like a vanilla ice cream. Everybody loves it and everybody has it. NoSQL and other solutions are like chocolate ice cream or custom ice cream – there is a huge base which loves them and wants them but not every ice cream maker can make it just right  for everyone’s taste. No matter how fancy an ice cream store is there is always plain vanilla ice cream available there. Just like the same, there are always cases and situations in the Big Data’s story where traditional relational database is the part of the whole story. In the real world scenarios there will be always the case when there will be need of the relational database concepts and its ideology. It is extremely important to accept relational database as one of the key components of the Big Data instead of treating it as a substandard technology. Ray of Hope – NewSQL In this module we discussed that there are places where we need ACID compliance from our Big Data application and NoSQL will not support that out of box. There is a new termed coined for the application/tool which supports most of the properties of the traditional RDBMS and supports Big Data infrastructure – NewSQL. Tomorrow In tomorrow’s blog post we will discuss about NewSQL. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words -  Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Using XA Transactions in Coherence-based Applications

    - by jpurdy
    While the costs of XA transactions are well known (e.g. increased data contention, higher latency, significant disk I/O for logging, availability challenges, etc.), in many cases they are the most attractive option for coordinating logical transactions across multiple resources. There are a few common approaches when integrating Coherence into applications via the use of an application server's transaction manager: Use of Coherence as a read-only cache, applying transactions to the underlying database (or any system of record) instead of the cache. Use of TransactionMap interface via the included resource adapter. Use of the new ACID transaction framework, introduced in Coherence 3.6.   Each of these may have significant drawbacks for certain workloads. Using Coherence as a read-only cache is the simplest option. In this approach, the application is responsible for managing both the database and the cache (either within the business logic or via application server hooks). This approach also tends to provide limited benefit for many workloads, particularly those workloads that either have queries (given the complexity of maintaining a fully cached data set in Coherence) or are not read-heavy (where the cost of managing the cache may outweigh the benefits of reading from it). All updates are made synchronously to the database, leaving it as both a source of latency as well as a potential bottleneck. This approach also prevents addressing "hot data" problems (when certain objects are updated by many concurrent transactions) since most database servers offer no facilities for explicitly controlling concurrent updates. Finally, this option tends to be a better fit for key-based access (rather than filter-based access such as queries) since this makes it easier to aggressively invalidate cache entries without worrying about when they will be reloaded. The advantage of this approach is that it allows strong data consistency as long as optimistic concurrency control is used to ensure that database updates are applied correctly regardless of whether the cache contains stale (or even dirty) data. Another benefit of this approach is that it avoids the limitations of Coherence's write-through caching implementation. TransactionMap is generally used when Coherence acts as system of record. TransactionMap is not generally compatible with write-through caching, so it will usually be either used to manage a standalone cache or when the cache is backed by a database via write-behind caching. TransactionMap has some restrictions that may limit its utility, the most significant being: The lock-based concurrency model is relatively inefficient and may introduce significant latency and contention. As an example, in a typical configuration, a transaction that updates 20 cache entries will require roughly 40ms just for lock management (assuming all locks are granted immediately, and excluding validation and writing which will require a similar amount of time). This may be partially mitigated by denormalizing (e.g. combining a parent object and its set of child objects into a single cache entry), at the cost of increasing false contention (e.g. transactions will conflict even when updating different child objects). If the client (application server JVM) fails during the commit phase, locks will be released immediately, and the transaction may be partially committed. In practice, this is usually not as bad as it may sound since the commit phase is usually very short (all locks having been previously acquired). Note that this vulnerability does not exist when a single NamedCache is used and all updates are confined to a single partition (generally implying the use of partition affinity). The unconventional TransactionMap API is cumbersome but manageable. Only a few methods are transactional, primarily get(), put() and remove(). The ACID transactions framework (accessed via the Connection class) provides atomicity guarantees by implementing the NamedCache interface, maintaining its own cache data and transaction logs inside a set of private partitioned caches. This feature may be used as either a local transactional resource or as logging XA resource. However, a lack of database integration precludes the use of this functionality for most applications. A side effect of this is that this feature has not seen significant adoption, meaning that any use of this is subject to the usual headaches associated with being an early adopter (greater chance of bugs and greater risk of hitting an unoptimized code path). As a result, for the moment, we generally recommend against using this feature. In summary, it is possible to use Coherence in XA-oriented applications, and several customers are doing this successfully, but it is not a core usage model for the product, so care should be taken before committing to this path. For most applications, the most robust solution is normally to use Coherence as a read-only cache of the underlying data resources, even if this prevents taking advantage of certain product features.

    Read the article

  • How to avoid using duplicate savepoint names in nested transactions in nested stored procs?

    - by Gary McGill
    I have a pattern that I almost always follow, where if I need to wrap up an operation in a transaction, I do this: BEGIN TRANSACTION SAVE TRANSACTION TX -- Stuff IF @error <> 0 ROLLBACK TRANSACTION TX COMMIT TRANSACTION That's served me well enough in the past, but after years of using this pattern (and copy-pasting the above code), I've suddenly discovered a flaw which comes as a complete shock. Quite often, I'll have a stored procedure calling other stored procedures, all of which use this same pattern. What I've discovered (to my cost) is that because I'm using the same savepoint name everywhere, I can get into a situation where my outer transaction is partially committed - precisely the opposite of the atomicity that I'm trying to achieve. I've put together an example that exhibits the problem. This is a single batch (no nested stored procs), and so it looks a little odd in that you probably wouldn't use the same savepoint name twice in the same batch, but my real-world scenario would be too confusing to post. CREATE TABLE Test (test INTEGER NOT NULL) BEGIN TRAN SAVE TRAN TX BEGIN TRAN SAVE TRAN TX INSERT INTO Test(test) VALUES (1) COMMIT TRAN TX BEGIN TRAN SAVE TRAN TX INSERT INTO Test(test) VALUES (2) COMMIT TRAN TX DELETE FROM Test ROLLBACK TRAN TX COMMIT TRAN TX SELECT * FROM Test DROP TABLE Test When I execute this, it lists one record, with value "1". In other words, even though I rolled back my outer transaction, a record was added to the table. What's happening is that the ROLLBACK TRANSACTION TX at the outer level is rolling back as far as the last SAVE TRANSACTION TX at the inner level. Now that I write this all out, I can see the logic behind it: the server is looking back through the log file, treating it as a linear stream of transactions; it doesn't understand the nesting/hierarchy implied by either the nesting of the transactions (or, in my real-world scenario, by the calls to other stored procedures). So, clearly, I need to start using unique savepoint names instead of blindly using "TX" everywhere. But - and this is where I finally get to the point - is there a way to do this in a copy-pastable way so that I can still use the same code everywhere? Can I auto-generate the savepoint name on the fly somehow? Is there a convention or best-practice for doing this sort of thing? It's not exactly hard to come up with a unique name every time you start a transaction (could base it off the SP name, or somesuch), but I do worry that eventually there would be a conflict - and you wouldn't know about it because rather than causing an error it just silently destroys your data... :-(

    Read the article

  • Inside the Concurrent Collections: ConcurrentDictionary

    - by Simon Cooper
    Using locks to implement a thread-safe collection is rather like using a sledgehammer - unsubtle, easy to understand, and tends to make any other tool redundant. Unlike the previous two collections I looked at, ConcurrentStack and ConcurrentQueue, ConcurrentDictionary uses locks quite heavily. However, it is careful to wield locks only where necessary to ensure that concurrency is maximised. This will, by necessity, be a higher-level look than my other posts in this series, as there is quite a lot of code and logic in ConcurrentDictionary. Therefore, I do recommend that you have ConcurrentDictionary open in a decompiler to have a look at all the details that I skip over. The problem with locks There's several things to bear in mind when using locks, as encapsulated by the lock keyword in C# and the System.Threading.Monitor class in .NET (if you're unsure as to what lock does in C#, I briefly covered it in my first post in the series): Locks block threads The most obvious problem is that threads waiting on a lock can't do any work at all. No preparatory work, no 'optimistic' work like in ConcurrentQueue and ConcurrentStack, nothing. It sits there, waiting to be unblocked. This is bad if you're trying to maximise concurrency. Locks are slow Whereas most of the methods on the Interlocked class can be compiled down to a single CPU instruction, ensuring atomicity at the hardware level, taking out a lock requires some heavy lifting by the CLR and the operating system. There's quite a bit of work required to take out a lock, block other threads, and wake them up again. If locks are used heavily, this impacts performance. Deadlocks When using locks there's always the possibility of a deadlock - two threads, each holding a lock, each trying to aquire the other's lock. Fortunately, this can be avoided with careful programming and structured lock-taking, as we'll see. So, it's important to minimise where locks are used to maximise the concurrency and performance of the collection. Implementation As you might expect, ConcurrentDictionary is similar in basic implementation to the non-concurrent Dictionary, which I studied in a previous post. I'll be using some concepts introduced there, so I recommend you have a quick read of it. So, if you were implementing a thread-safe dictionary, what would you do? The naive implementation is to simply have a single lock around all methods accessing the dictionary. This would work, but doesn't allow much concurrency. Fortunately, the bucketing used by Dictionary allows a simple but effective improvement to this - one lock per bucket. This allows different threads modifying different buckets to do so in parallel. Any thread making changes to the contents of a bucket takes the lock for that bucket, ensuring those changes are thread-safe. The method that maps each bucket to a lock is the GetBucketAndLockNo method: private void GetBucketAndLockNo( int hashcode, out int bucketNo, out int lockNo, int bucketCount) { // the bucket number is the hashcode (without the initial sign bit) // modulo the number of buckets bucketNo = (hashcode & 0x7fffffff) % bucketCount; // and the lock number is the bucket number modulo the number of locks lockNo = bucketNo % m_locks.Length; } However, this does require some changes to how the buckets are implemented. The 'implicit' linked list within a single backing array used by the non-concurrent Dictionary adds a dependency between separate buckets, as every bucket uses the same backing array. Instead, ConcurrentDictionary uses a strict linked list on each bucket: This ensures that each bucket is entirely separate from all other buckets; adding or removing an item from a bucket is independent to any changes to other buckets. Modifying the dictionary All the operations on the dictionary follow the same basic pattern: void AlterBucket(TKey key, ...) { int bucketNo, lockNo; 1: GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, m_buckets.Length); 2: lock (m_locks[lockNo]) { 3: Node headNode = m_buckets[bucketNo]; 4: Mutate the node linked list as appropriate } } For example, when adding another entry to the dictionary, you would iterate through the linked list to check whether the key exists already, and add the new entry as the head node. When removing items, you would find the entry to remove (if it exists), and remove the node from the linked list. Adding, updating, and removing items all follow this pattern. Performance issues There is a problem we have to address at this point. If the number of buckets in the dictionary is fixed in the constructor, then the performance will degrade from O(1) to O(n) when a large number of items are added to the dictionary. As more and more items get added to the linked lists in each bucket, the lookup operations will spend most of their time traversing a linear linked list. To fix this, the buckets array has to be resized once the number of items in each bucket has gone over a certain limit. (In ConcurrentDictionary this limit is when the size of the largest bucket is greater than the number of buckets for each lock. This check is done at the end of the TryAddInternal method.) Resizing the bucket array and re-hashing everything affects every bucket in the collection. Therefore, this operation needs to take out every lock in the collection. Taking out mutiple locks at once inevitably summons the spectre of the deadlock; two threads each hold a lock, and each trying to acquire the other lock. How can we eliminate this? Simple - ensure that threads never try to 'swap' locks in this fashion. When taking out multiple locks, always take them out in the same order, and always take out all the locks you need before starting to release them. In ConcurrentDictionary, this is controlled by the AcquireLocks, AcquireAllLocks and ReleaseLocks methods. Locks are always taken out and released in the order they are in the m_locks array, and locks are all released right at the end of the method in a finally block. At this point, it's worth pointing out that the locks array is never re-assigned, even when the buckets array is increased in size. The number of locks is fixed in the constructor by the concurrencyLevel parameter. This simplifies programming the locks; you don't have to check if the locks array has changed or been re-assigned before taking out a lock object. And you can be sure that when a thread takes out a lock, another thread isn't going to re-assign the lock array. This would create a new series of lock objects, thus allowing another thread to ignore the existing locks (and any threads controlling them), breaking thread-safety. Consequences of growing the array Just because we're using locks doesn't mean that race conditions aren't a problem. We can see this by looking at the GrowTable method. The operation of this method can be boiled down to: private void GrowTable(Node[] buckets) { try { 1: Acquire first lock in the locks array // this causes any other thread trying to take out // all the locks to block because the first lock in the array // is always the one taken out first // check if another thread has already resized the buckets array // while we were waiting to acquire the first lock 2: if (buckets != m_buckets) return; 3: Calculate the new size of the backing array 4: Node[] array = new array[size]; 5: Acquire all the remaining locks 6: Re-hash the contents of the existing buckets into array 7: m_buckets = array; } finally { 8: Release all locks } } As you can see, there's already a check for a race condition at step 2, for the case when the GrowTable method is called twice in quick succession on two separate threads. One will successfully resize the buckets array (blocking the second in the meantime), when the second thread is unblocked it'll see that the array has already been resized & exit without doing anything. There is another case we need to consider; looking back at the AlterBucket method above, consider the following situation: Thread 1 calls AlterBucket; step 1 is executed to get the bucket and lock numbers. Thread 2 calls GrowTable and executes steps 1-5; thread 1 is blocked when it tries to take out the lock in step 2. Thread 2 re-hashes everything, re-assigns the buckets array, and releases all the locks (steps 6-8). Thread 1 is unblocked and continues executing, but the calculated bucket and lock numbers are no longer valid. Between calculating the correct bucket and lock number and taking out the lock, another thread has changed where everything is. Not exactly thread-safe. Well, a similar problem was solved in ConcurrentStack and ConcurrentQueue by storing a local copy of the state, doing the necessary calculations, then checking if that state is still valid. We can use a similar idea here: void AlterBucket(TKey key, ...) { while (true) { Node[] buckets = m_buckets; int bucketNo, lockNo; GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, buckets.Length); lock (m_locks[lockNo]) { // if the state has changed, go back to the start if (buckets != m_buckets) continue; Node headNode = m_buckets[bucketNo]; Mutate the node linked list as appropriate } break; } } TryGetValue and GetEnumerator And so, finally, we get onto TryGetValue and GetEnumerator. I've left these to the end because, well, they don't actually use any locks. How can this be? Whenever you change a bucket, you need to take out the corresponding lock, yes? Indeed you do. However, it is important to note that TryGetValue and GetEnumerator don't actually change anything. Just as immutable objects are, by definition, thread-safe, read-only operations don't need to take out a lock because they don't change anything. All lockless methods can happily iterate through the buckets and linked lists without worrying about locking anything. However, this does put restrictions on how the other methods operate. Because there could be another thread in the middle of reading the dictionary at any time (even if a lock is taken out), the dictionary has to be in a valid state at all times. Every change to state has to be made visible to other threads in a single atomic operation (all relevant variables are marked volatile to help with this). This restriction ensures that whatever the reading threads are doing, they never read the dictionary in an invalid state (eg items that should be in the collection temporarily removed from the linked list, or reading a node that has had it's key & value removed before the node itself has been removed from the linked list). Fortunately, all the operations needed to change the dictionary can be done in that way. Bucket resizes are made visible when the new array is assigned back to the m_buckets variable. Any additions or modifications to a node are done by creating a new node, then splicing it into the existing list using a single variable assignment. Node removals are simply done by re-assigning the node's m_next pointer. Because the dictionary can be changed by another thread during execution of the lockless methods, the GetEnumerator method is liable to return dirty reads - changes made to the dictionary after GetEnumerator was called, but before the enumeration got to that point in the dictionary. It's worth listing at this point which methods are lockless, and which take out all the locks in the dictionary to ensure they get a consistent view of the dictionary: Lockless: TryGetValue GetEnumerator The indexer getter ContainsKey Takes out every lock (lockfull?): Count IsEmpty Keys Values CopyTo ToArray Concurrent principles That covers the overall implementation of ConcurrentDictionary. I haven't even begun to scratch the surface of this sophisticated collection. That I leave to you. However, we've looked at enough to be able to extract some useful principles for concurrent programming: Partitioning When using locks, the work is partitioned into independant chunks, each with its own lock. Each partition can then be modified concurrently to other partitions. Ordered lock-taking When a method does need to control the entire collection, locks are taken and released in a fixed order to prevent deadlocks. Lockless reads Read operations that don't care about dirty reads don't take out any lock; the rest of the collection is implemented so that any reading thread always has a consistent view of the collection. That leads us to the final collection in this little series - ConcurrentBag. Lacking a non-concurrent analogy, it is quite different to any other collection in the class libraries. Prepare your thinking hats!

    Read the article

< Previous Page | 1 2