Search Results

Search found 882 results on 36 pages for 'concurrency coordination'.

Page 19/36 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >

Why mysql 5.5 slower than 5.1 (linux,using mysqlslap)

- by Zenofo

my.cnf (5.5 and 5.1 is the same) : back_log=200 max_connections=512 max_connect_errors=999999 key_buffer=512M max_allowed_packet=8M table_cache=512 sort_buffer=8M read_buffer_size=8M thread_cache=8 thread_concurrency=4 myisam_sort_buffer_size=128M interactive_timeout=28800 wait_timeout=7200 mysql 5.5: ..mysql5.5/bin/mysqlslap -a --concurrency=10 --number-of-queries 5000 --iterations=5 -S /tmp/mysql_5.5.sock --engine=innodb Benchmark Running for engine innodb Average number of seconds to run all queries: 15.156 seconds Minimum number of seconds to run all queries: 15.031 seconds Maximum number of seconds to run all queries: 15.296 seconds Number of clients running queries: 10 Average number of queries per client: 500 mysql5.1: ..mysql5.5/bin/mysqlslap -a --concurrency=10 --number-of-queries 5000 --iterations=5 -S /tmp/mysql_5.1.sock --engine=innodb Benchmark Running for engine innodb Average number of seconds to run all queries: 13.252 seconds Minimum number of seconds to run all queries: 13.019 seconds Maximum number of seconds to run all queries: 13.480 seconds Number of clients running queries: 10 Average number of queries per client: 500 Why mysql 5.5 slower than 5.1 ? BTW:I'm tried mysql5.5/bin/mysqlslap and mysql5.1/bin/mysqlslap,result is the same

Read the article
Should I amortize scripting cost via bytecode analysis or multithreading?

- by user18983

I'm working on a game sort of thing where users can write arbitrary code for individual agents, and I'm trying to decide the best way to divide up computation time. The simplest option would be to give each agent a set amount of time and skip their turn if it elapses without an action being decided upon, but I would like people to be able to write their agents decision functions without having to think too much about how long its taking unless they really want to. The two approaches I'm considering are giving each agent a set number of bytecode instructions (taking cost into account) each timestep, and making players deal with the consequences of the game state changing between blocks of computation (as with Battlecode) or giving each agent it's own thread and giving each thread equal time on the processor. I'm about equally knowledgeable on both concurrency and bytecode stuff, which is to say not very, so I'm wondering which approach would be best. I have a clearer idea of how I'd structure things if I used bytecode, but less certainty about how to actually implement the analysis. I'm pretty sure I can work up a concurrency based system without much trouble, but I worry it will be messier with more overhead and will add unnecessary complexity to the project.

Read the article
Partition Wise Joins

- by jean-pierre.dijcks

Some say they are the holy grail of parallel computing and PWJ is the basis for a shared nothing system and the only join method that is available on a shared nothing system (yes this is oversimplified!). The magic in Oracle is of course that is one of many ways to join data. And yes, this is the old flexibility vs. simplicity discussion all over, so I won't go there... the point is that what you must do in a shared nothing system, you can do in Oracle with the same speed and methods. The Theory A partition wise join is a join between (for simplicity) two tables that are partitioned on the same column with the same partitioning scheme. In shared nothing this is effectively hard partitioning locating data on a specific node / storage combo. In Oracle is is logical partitioning. If you now join the two tables on that partitioned column you can break up the join in smaller joins exactly along the partitions in the data. Since they are partitioned (grouped) into the same buckets, all values required to do the join live in the equivalent bucket on either sides. No need to talk to anyone else, no need to redistribute data to anyone else... in short, the optimal join method for parallel processing of two large data sets. PWJ's in Oracle Since we do not hard partition the data across nodes in Oracle we use the Partitioning option to the database to create the buckets, then set the Degree of Parallelism (or run Auto DOP - see here) and get our PWJs. The main questions always asked are: How many partitions should I create? What should my DOP be? In a shared nothing system the answer is of course, as many partitions as there are nodes which will be your DOP. In Oracle we do want you to look at the workload and concurrency, and once you know that to understand the following rules of thumb. Within Oracle we have more ways of joining of data, so it is important to understand some of the PWJ ideas and what it means if you have an uneven distribution across processes. Assume we have a simple scenario where we partition the data on a hash key resulting in 4 hash partitions (H1 -H4). We have 2 parallel processes that have been tasked with reading these partitions (P1 - P2). The work is evenly divided assuming the partitions are the same size and we can scan this in time t1 as shown below. Now assume that we have changed the system and have a 5th partition but still have our 2 workers P1 and P2. The time it takes is actually 50% more assuming the 5th partition has the same size as the original H1 - H4 partitions. In other words to scan these 5 partitions, the time t2 it takes is not 1/5th more expensive, it is a lot more expensive and some other join plans may now start to look exciting to the optimizer. Just to post the disclaimer, it is not as simple as I state it here, but you get the idea on how much more expensive this plan may now look... Based on this little example there are a few rules of thumb to follow to get the partition wise joins. First, choose a DOP that is a factor of two (2). So always choose something like 2, 4, 8, 16, 32 and so on... Second, choose a number of partitions that is larger or equal to 2* DOP. Third, make sure the number of partitions is divisible through 2 without orphans. This is also known as an even number... Fourth, choose a stable partition count strategy, which is typically hash, which can be a sub partitioning strategy rather than the main strategy (range - hash is a popular one). Fifth, make sure you do this on the join key between the two large tables you want to join (and this should be the obvious one...). Translating this into an example: DOP = 8 (determined based on concurrency or by using Auto DOP with a cap due to concurrency) says that the number of partitions >= 16. Number of hash (sub) partitions = 32, which gives each process four partitions to work on. This number is somewhat arbitrary and depends on your data and system. In this case my main reasoning is that if you get more room on the box you can easily move the DOP for the query to 16 without repartitioning... and of course it makes for no leftovers on the table... And yes, we recommend up-to-date statistics. And before you start complaining, do read this post on a cool way to do stats in 11.

Read the article
Coherence Data Guarantees for Data Reads - Basic Terminology

- by jpurdy

When integrating Coherence into applications, each application has its own set of requirements with respect to data integrity guarantees. Developers often describe these requirements using expressions like "avoiding dirty reads" or "making sure that updates are transactional", but we often find that even in a small group of people, there may be a wide range of opinions as to what these terms mean. This may simply be due to a lack of familiarity, but given that Coherence sits at an intersection of several (mostly) unrelated fields, it may be a matter of conflicting vocabularies (e.g. "consistency" is similar but different in transaction processing versus multi-threaded programming). Since almost all data read consistency issues are related to the concept of concurrency, it is helpful to start with a definition of that, or rather what it means for two operations to be concurrent. Rather than implying that they occur "at the same time", concurrency is a slightly weaker statement -- it simply means that it can't be proven that one event precedes (or follows) the other. As an example, in a Coherence application, if two client members mutate two different cache entries sitting on two different cache servers at roughly the same time, it is likely that one update will precede the other by a significant amount of time (say 0.1ms). However, since there is no guarantee that all four members have their clocks perfectly synchronized, and there is no way to precisely measure the time it takes to send a given message between any two members (that have differing clocks), we consider these to be concurrent operations since we can not (easily) prove otherwise. So this leads to a question that we hear quite frequently: "Are the contents of the near cache always synchronized with the underlying distributed cache?". It's easy to see that if an update on a cache server results in a message being sent to each near cache, and then that near cache being updated that there is a window where the contents are different. However, this is irrelevant, since even if the application reads directly from the distributed cache, another thread update the cache before the read is returned to the application. Even if no other member modifies a cache entry prior to the local near cache entry being updated (and subsequently read), the purpose of reading a cache entry is to do something with the result, usually either displaying for consumption by a human, or by updating the entry based on the current state of the entry. In the former case, it's clear that if the data is updated faster than a human can perceive, then there is no problem (and in many cases this can be relaxed even further). For the latter case, the application must assume that the value might potentially be updated before it has a chance to update it. This almost aways the case with read-only caches, and the solution is the traditional optimistic transaction pattern, which requires the application to explicitly state what assumptions it made about the old value of the cache entry. If the application doesn't want to bother stating those assumptions, it is free to lock the cache entry prior to reading it, ensuring that no other threads will mutate the entry, a pessimistic approach. The optimistic approach relies on what is sometimes called a "fuzzy read". In other words, the application assumes that the read should be correct, but it also acknowledges that it might not be. (I use the qualifier "sometimes" because in some writings, "fuzzy read" indicates the situation where the application actually sees an original value and then later sees an updated value within the same transaction -- however, both definitions are roughly equivalent from an application design perspective). If the read is not correct it is called a "stale read". Going back to the definition of concurrency, it may seem difficult to precisely define a stale read, but the practical way of detecting a stale read is that is will cause the encompassing transaction to roll back if it tries to update that value. The pessimistic approach relies on a "coherent read", a guarantee that the value returned is not only the same as the primary copy of that value, but also that it will remain that way. In most cases this can be used interchangeably with "repeatable read" (though that term has additional implications when used in the context of a database system). In none of cases above is it possible for the application to perform a "dirty read". A dirty read occurs when the application reads a piece of data that was never committed. In practice the only way this can occur is with multi-phase updates such as transactions, where a value may be temporarily update but then withdrawn when a transaction is rolled back. If another thread sees that value prior to the rollback, it is a dirty read. If an application uses optimistic transactions, dirty reads will merely result in a lack of forward progress (this is actually one of the main risks of dirty reads -- they can be chained and potentially cause cascading rollbacks). The concepts of dirty reads, fuzzy reads, stale reads and coherent reads are able to describe the vast majority of requirements that we see in the field. However, the important thing is to define the terms used to define requirements. A quick web search for each of the terms in this article will show multiple meanings, so I've selected what are generally the most common variations, but it never hurts to state each definition explicitly if they are critical to the success of a project (many applications have sufficiently loose requirements that precise terminology can be avoided).

Read the article
Plagued by multithreaded bugs

- by koncurrency

On my new team that I manage, the majority of our code is platform, TCP socket, and http networking code. All C++. Most of it originated from other developers that have left the team. The current developers on the team are very smart, but mostly junior in terms of experience. Our biggest problem: multi-threaded concurrency bugs. Most of our class libraries are written to be asynchronous by use of some thread pool classes. Methods on the class libraries often enqueue long running taks onto the thread pool from one thread and then the callback methods of that class get invoked on a different thread. As a result, we have a lot of edge case bugs involving incorrect threading assumptions. This results in subtle bugs that go beyond just having critical sections and locks to guard against concurrency issues. What makes these problems even harder is that the attempts to fix are often incorrect. Some mistakes I've observed the team attempting (or within the legacy code itself) includes something like the following: Common mistake #1 - Fixing concurrency issue by just put a lock around the shared data, but forgetting about what happens when methods don't get called in an expected order. Here's a very simple example: void Foo::OnHttpRequestComplete(statuscode status) { m_pBar->DoSomethingImportant(status); } void Foo::Shutdown() { m_pBar->Cleanup(); delete m_pBar; m_pBar=nullptr; } So now we have a bug in which Shutdown could get called while OnHttpNetworkRequestComplete is occuring on. A tester finds the bug, captures the crash dump, and assigns the bug to a developer. He in turn fixes the bug like this. void Foo::OnHttpRequestComplete(statuscode status) { AutoLock lock(m_cs); m_pBar->DoSomethingImportant(status); } void Foo::Shutdown() { AutoLock lock(m_cs); m_pBar->Cleanup(); delete m_pBar; m_pBar=nullptr; } The above fix looks good until you realize there's an even more subtle edge case. What happens if Shutdown gets called before OnHttpRequestComplete gets called back? The real world examples my team has are even more complex, and the edge cases are even harder to spot during the code review process. Common Mistake #2 - fixing deadlock issues by blindly exiting the lock, wait for the other thread to finish, then re-enter the lock - but without handling the case that the object just got updated by the other thread! Common Mistake #3 - Even though the objects are reference counted, the shutdown sequence "releases" it's pointer. But forgets to wait for the thread that is still running to release it's instance. As such, components are shutdown cleanly, then spurious or late callbacks are invoked on an object in an state not expecting any more calls. There are other edge cases, but the bottom line is this: Multithreaded programming is just plain hard, even for smart people. As I catch these mistakes, I spend time discussing the errors with each developer on developing a more appropriate fix. But I suspect they are often confused on how to solve each issue because of the enormous amount of legacy code that the "right" fix will involve touching. We're going to be shipping soon, and I'm sure the patches we're applying will hold for the upcoming release. Afterwards, we're going to have some time to improve the code base and refactor where needed. We won't have time to just re-write everything. And the majority of the code isn't all that bad. But I'm looking to refactor code such that threading issues can be avoided altogether. One approach I am considering is this. For each significant platform feature, have a dedicated single thread where all events and network callbacks get marshalled onto. Similar to COM apartment threading in Windows with use of a message loop. Long blocking operations could still get dispatched to a work pool thread, but the completion callback is invoked on on the component's thread. Components could possibly even share the same thread. Then all the class libraries running inside the thread can be written under the assumption of a single threaded world. Before I go down that path, I am also very interested if there are other standard techniques or design patterns for dealing with multithreaded issues. And I have to emphasize - something beyond a book that describes the basics of mutexes and semaphores. What do you think? I am also interested in any other approaches to take towards a refactoring process. Including any of the following: Literature or papers on design patterns around threads. Something beyond an introduction to mutexes and semaphores. We don't need massive parallelism either, just ways to design an object model so as to handle asynchronous events from other threads correctly. Ways to diagram the threading of various components, so that it will be easy to study and evolve solutions for. (That is, a UML equivalent for discussing threads across objects and classes) Educating your development team on the issues with multithreaded code. What would you do?

Read the article
Inside the Concurrent Collections: ConcurrentDictionary

- by Simon Cooper

Using locks to implement a thread-safe collection is rather like using a sledgehammer - unsubtle, easy to understand, and tends to make any other tool redundant. Unlike the previous two collections I looked at, ConcurrentStack and ConcurrentQueue, ConcurrentDictionary uses locks quite heavily. However, it is careful to wield locks only where necessary to ensure that concurrency is maximised. This will, by necessity, be a higher-level look than my other posts in this series, as there is quite a lot of code and logic in ConcurrentDictionary. Therefore, I do recommend that you have ConcurrentDictionary open in a decompiler to have a look at all the details that I skip over. The problem with locks There's several things to bear in mind when using locks, as encapsulated by the lock keyword in C# and the System.Threading.Monitor class in .NET (if you're unsure as to what lock does in C#, I briefly covered it in my first post in the series): Locks block threads The most obvious problem is that threads waiting on a lock can't do any work at all. No preparatory work, no 'optimistic' work like in ConcurrentQueue and ConcurrentStack, nothing. It sits there, waiting to be unblocked. This is bad if you're trying to maximise concurrency. Locks are slow Whereas most of the methods on the Interlocked class can be compiled down to a single CPU instruction, ensuring atomicity at the hardware level, taking out a lock requires some heavy lifting by the CLR and the operating system. There's quite a bit of work required to take out a lock, block other threads, and wake them up again. If locks are used heavily, this impacts performance. Deadlocks When using locks there's always the possibility of a deadlock - two threads, each holding a lock, each trying to aquire the other's lock. Fortunately, this can be avoided with careful programming and structured lock-taking, as we'll see. So, it's important to minimise where locks are used to maximise the concurrency and performance of the collection. Implementation As you might expect, ConcurrentDictionary is similar in basic implementation to the non-concurrent Dictionary, which I studied in a previous post. I'll be using some concepts introduced there, so I recommend you have a quick read of it. So, if you were implementing a thread-safe dictionary, what would you do? The naive implementation is to simply have a single lock around all methods accessing the dictionary. This would work, but doesn't allow much concurrency. Fortunately, the bucketing used by Dictionary allows a simple but effective improvement to this - one lock per bucket. This allows different threads modifying different buckets to do so in parallel. Any thread making changes to the contents of a bucket takes the lock for that bucket, ensuring those changes are thread-safe. The method that maps each bucket to a lock is the GetBucketAndLockNo method: private void GetBucketAndLockNo( int hashcode, out int bucketNo, out int lockNo, int bucketCount) { // the bucket number is the hashcode (without the initial sign bit) // modulo the number of buckets bucketNo = (hashcode & 0x7fffffff) % bucketCount; // and the lock number is the bucket number modulo the number of locks lockNo = bucketNo % m_locks.Length; } However, this does require some changes to how the buckets are implemented. The 'implicit' linked list within a single backing array used by the non-concurrent Dictionary adds a dependency between separate buckets, as every bucket uses the same backing array. Instead, ConcurrentDictionary uses a strict linked list on each bucket: This ensures that each bucket is entirely separate from all other buckets; adding or removing an item from a bucket is independent to any changes to other buckets. Modifying the dictionary All the operations on the dictionary follow the same basic pattern: void AlterBucket(TKey key, ...) { int bucketNo, lockNo; 1: GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, m_buckets.Length); 2: lock (m_locks[lockNo]) { 3: Node headNode = m_buckets[bucketNo]; 4: Mutate the node linked list as appropriate } } For example, when adding another entry to the dictionary, you would iterate through the linked list to check whether the key exists already, and add the new entry as the head node. When removing items, you would find the entry to remove (if it exists), and remove the node from the linked list. Adding, updating, and removing items all follow this pattern. Performance issues There is a problem we have to address at this point. If the number of buckets in the dictionary is fixed in the constructor, then the performance will degrade from O(1) to O(n) when a large number of items are added to the dictionary. As more and more items get added to the linked lists in each bucket, the lookup operations will spend most of their time traversing a linear linked list. To fix this, the buckets array has to be resized once the number of items in each bucket has gone over a certain limit. (In ConcurrentDictionary this limit is when the size of the largest bucket is greater than the number of buckets for each lock. This check is done at the end of the TryAddInternal method.) Resizing the bucket array and re-hashing everything affects every bucket in the collection. Therefore, this operation needs to take out every lock in the collection. Taking out mutiple locks at once inevitably summons the spectre of the deadlock; two threads each hold a lock, and each trying to acquire the other lock. How can we eliminate this? Simple - ensure that threads never try to 'swap' locks in this fashion. When taking out multiple locks, always take them out in the same order, and always take out all the locks you need before starting to release them. In ConcurrentDictionary, this is controlled by the AcquireLocks, AcquireAllLocks and ReleaseLocks methods. Locks are always taken out and released in the order they are in the m_locks array, and locks are all released right at the end of the method in a finally block. At this point, it's worth pointing out that the locks array is never re-assigned, even when the buckets array is increased in size. The number of locks is fixed in the constructor by the concurrencyLevel parameter. This simplifies programming the locks; you don't have to check if the locks array has changed or been re-assigned before taking out a lock object. And you can be sure that when a thread takes out a lock, another thread isn't going to re-assign the lock array. This would create a new series of lock objects, thus allowing another thread to ignore the existing locks (and any threads controlling them), breaking thread-safety. Consequences of growing the array Just because we're using locks doesn't mean that race conditions aren't a problem. We can see this by looking at the GrowTable method. The operation of this method can be boiled down to: private void GrowTable(Node[] buckets) { try { 1: Acquire first lock in the locks array // this causes any other thread trying to take out // all the locks to block because the first lock in the array // is always the one taken out first // check if another thread has already resized the buckets array // while we were waiting to acquire the first lock 2: if (buckets != m_buckets) return; 3: Calculate the new size of the backing array 4: Node[] array = new array[size]; 5: Acquire all the remaining locks 6: Re-hash the contents of the existing buckets into array 7: m_buckets = array; } finally { 8: Release all locks } } As you can see, there's already a check for a race condition at step 2, for the case when the GrowTable method is called twice in quick succession on two separate threads. One will successfully resize the buckets array (blocking the second in the meantime), when the second thread is unblocked it'll see that the array has already been resized & exit without doing anything. There is another case we need to consider; looking back at the AlterBucket method above, consider the following situation: Thread 1 calls AlterBucket; step 1 is executed to get the bucket and lock numbers. Thread 2 calls GrowTable and executes steps 1-5; thread 1 is blocked when it tries to take out the lock in step 2. Thread 2 re-hashes everything, re-assigns the buckets array, and releases all the locks (steps 6-8). Thread 1 is unblocked and continues executing, but the calculated bucket and lock numbers are no longer valid. Between calculating the correct bucket and lock number and taking out the lock, another thread has changed where everything is. Not exactly thread-safe. Well, a similar problem was solved in ConcurrentStack and ConcurrentQueue by storing a local copy of the state, doing the necessary calculations, then checking if that state is still valid. We can use a similar idea here: void AlterBucket(TKey key, ...) { while (true) { Node[] buckets = m_buckets; int bucketNo, lockNo; GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, buckets.Length); lock (m_locks[lockNo]) { // if the state has changed, go back to the start if (buckets != m_buckets) continue; Node headNode = m_buckets[bucketNo]; Mutate the node linked list as appropriate } break; } } TryGetValue and GetEnumerator And so, finally, we get onto TryGetValue and GetEnumerator. I've left these to the end because, well, they don't actually use any locks. How can this be? Whenever you change a bucket, you need to take out the corresponding lock, yes? Indeed you do. However, it is important to note that TryGetValue and GetEnumerator don't actually change anything. Just as immutable objects are, by definition, thread-safe, read-only operations don't need to take out a lock because they don't change anything. All lockless methods can happily iterate through the buckets and linked lists without worrying about locking anything. However, this does put restrictions on how the other methods operate. Because there could be another thread in the middle of reading the dictionary at any time (even if a lock is taken out), the dictionary has to be in a valid state at all times. Every change to state has to be made visible to other threads in a single atomic operation (all relevant variables are marked volatile to help with this). This restriction ensures that whatever the reading threads are doing, they never read the dictionary in an invalid state (eg items that should be in the collection temporarily removed from the linked list, or reading a node that has had it's key & value removed before the node itself has been removed from the linked list). Fortunately, all the operations needed to change the dictionary can be done in that way. Bucket resizes are made visible when the new array is assigned back to the m_buckets variable. Any additions or modifications to a node are done by creating a new node, then splicing it into the existing list using a single variable assignment. Node removals are simply done by re-assigning the node's m_next pointer. Because the dictionary can be changed by another thread during execution of the lockless methods, the GetEnumerator method is liable to return dirty reads - changes made to the dictionary after GetEnumerator was called, but before the enumeration got to that point in the dictionary. It's worth listing at this point which methods are lockless, and which take out all the locks in the dictionary to ensure they get a consistent view of the dictionary: Lockless: TryGetValue GetEnumerator The indexer getter ContainsKey Takes out every lock (lockfull?): Count IsEmpty Keys Values CopyTo ToArray Concurrent principles That covers the overall implementation of ConcurrentDictionary. I haven't even begun to scratch the surface of this sophisticated collection. That I leave to you. However, we've looked at enough to be able to extract some useful principles for concurrent programming: Partitioning When using locks, the work is partitioned into independant chunks, each with its own lock. Each partition can then be modified concurrently to other partitions. Ordered lock-taking When a method does need to control the entire collection, locks are taken and released in a fixed order to prevent deadlocks. Lockless reads Read operations that don't care about dirty reads don't take out any lock; the rest of the collection is implemented so that any reading thread always has a consistent view of the collection. That leads us to the final collection in this little series - ConcurrentBag. Lacking a non-concurrent analogy, it is quite different to any other collection in the class libraries. Prepare your thinking hats!

Read the article
How to reconcile my support of open-source software and need to feed and house myself?

- by Guzba

I have a bit of a dilemma and wanted to get some other developers' opinions on it and maybe some guidance. So I have created a 2D game for Android from the ground up, learning and re factoring as I went along. I did it for the experience, and have been proud of the results. I released it for free as ad supported with AdMob not really expecting much out of it, but curious to see what would happen. Its been a few of months now since release, and it has become very popular (250k downloads!). Additionally, the ad revenue is great and is driving me to make more good games and even allowing me to work less so that I can focus on my own works. When I originally began working on the game, I was pretty new to concurrency and completely new to Android (had Java experience though). The standard advice I got for starting an Android game was to look at the sample games from Google (Snake, Lunar Lander, ...) so I did. In my opinion, these Android sample games from Google are decent to see in general what your code should look like, but not actually all that great to follow. This is because some of their features don't work (saving game state), the concurrency is both unexplained and cumbersome (there is no real separation between the game thread and the UI thread since they sync lock each other out all the time and the UI thread runs game thread code). This made it difficult for me as a newbie to concurrency to understand how it was organized and what was really running what code. Here is my dilemma: After spending this past few months slowly improving my code, I feel that it could be very beneficial to developers who are in the same position that I was in when I started. (Since it is not a complex game, but clearly coded in my opinion.) I want to open up the source so that others can learn from it but don't want to lose my ad revenue stream, which, if I did open the source, I fear I would when people released versions with the ad stripped, or minor tweaks that would fragment my audience, etc. I am a CS undergrad major in college and this money is giving me the freedom to work less at summer job, thus giving me the time and will to work on more of my own projects and improving my own skills while still paying the bills. So what do I do? Open the source at personal sacrifice for the greater good, or keep it closed and be a sort of hypocritical supporter of open source?

Read the article
Does this prove a network bandwidth bottleneck?

- by Yuji Tomita

I've incorrectly assumed that my internal AB testing means my server can handle 1k concurrency @3k hits per second. My theory at at the moment is that the network is the bottleneck. The server can't send enough data fast enough. External testing from blitz.io at 1k concurrency shows my hits/s capping off at 180, with pages taking longer and longer to respond as the server is only able to return 180 per second. I've served a blank file from nginx and benched it: it scales 1:1 with concurrency. Now to rule out IO / memcached bottlenecks (nginx normally pulls from memcached), I serve up a static version of the cached page from the filesystem. The results are very similar to my original test; I'm capped at around 180 RPS. Splitting the HTML page in half gives me double the RPS, so it's definitely limited by the size of the page. If I internally ApacheBench from the local server, I get consistent results of around 4k RPS on both the Full Page and the Half Page, at high transfer rates. Transfer rate: 62586.14 [Kbytes/sec] received If I AB from an external server, I get around 180RPS - same as the blitz.io results. How do I know it's not intentional throttling? If I benchmark from multiple external servers, all results become poor which leads me to believe the problem is in MY servers outbound traffic, not a download speed issue with my benchmarking servers / blitz.io. So I'm back to my conclusion that my server can't send data fast enough. Am I right? Are there other ways to interpret this data? Is the solution/optimization to set up multiple servers + load balancing that can each serve 180 hits per second? I'm quite new to server optimization, so I'd appreciate any confirmation interpreting this data. Outbound traffic Here's more information about the outbound bandwidth: The network graph shows a maximum output of 16 Mb/s: 16 megabits per second. Doesn't sound like much at all. Due to a suggestion about throttling, I looked into this and found that linode has a 50mbps cap (which I'm not even close to hitting, apparently). I had it raised to 100mbps. Since linode caps my traffic, and I'm not even hitting it, does this mean that my server should indeed be capable of outputting up to 100mbps but is limited by some other internal bottleneck? I just don't understand how networks at this large of a scale work; can they literally send data as fast as they can read from the HDD? Is the network pipe that big? In conclusion 1: Based on the above, I'm thinking I can definitely raise my 180RPS by adding an nginx load balancer on top of a multi nginx server setup at exactly 180RPS per server behind the LB. 2: If linode has a 50/100mbit limit that I'm not hitting at all, there must be something I can do to hit that limit with my single server setup. If I can read / transmit data fast enough locally, and linode even bothers to have a 50mbit/100mbit cap, there must be an internal bottleneck that's not allowing me to hit those caps that I'm not sure how to detect. Correct? I realize the question is huge and vague now, but I'm not sure how to condense it. Any input is appreciated on any conclusion I've made.

Read the article
C# 4: The Curious ConcurrentDictionary

- by James Michael Hare

In my previous post (here) I did a comparison of the new ConcurrentQueue versus the old standard of a System.Collections.Generic Queue with simple locking. The results were exactly what I would have hoped, that the ConcurrentQueue was faster with multi-threading for most all situations. In addition, concurrent collections have the added benefit that you can enumerate them even if they're being modified. So I set out to see what the improvements would be for the ConcurrentDictionary, would it have the same performance benefits as the ConcurrentQueue did? Well, after running some tests and multiple tweaks and tunes, I have good and bad news. But first, let's look at the tests. Obviously there's many things we can do with a dictionary. One of the most notable uses, of course, in a multi-threaded environment is for a small, local in-memory cache. So I set about to do a very simple simulation of a cache where I would create a test class that I'll just call an Accessor. This accessor will attempt to look up a key in the dictionary, and if the key exists, it stops (i.e. a cache "hit"). However, if the lookup fails, it will then try to add the key and value to the dictionary (i.e. a cache "miss"). So here's the Accessor that will run the tests: 1: internal class Accessor 2: { 3: public int Hits { get; set; } 4: public int Misses { get; set; } 5: public Func<int, string> GetDelegate { get; set; } 6: public Action<int, string> AddDelegate { get; set; } 7: public int Iterations { get; set; } 8: public int MaxRange { get; set; } 9: public int Seed { get; set; } 10: 11: public void Access() 12: { 13: var randomGenerator = new Random(Seed); 14: 15: for (int i=0; i<Iterations; i++) 16: { 17: // give a wide spread so will have some duplicates and some unique 18: var target = randomGenerator.Next(1, MaxRange); 19: 20: // attempt to grab the item from the cache 21: var result = GetDelegate(target); 22: 23: // if the item doesn't exist, add it 24: if(result == null) 25: { 26: AddDelegate(target, target.ToString()); 27: Misses++; 28: } 29: else 30: { 31: Hits++; 32: } 33: } 34: } 35: } Note that so I could test different implementations, I defined a GetDelegate and AddDelegate that will call the appropriate dictionary methods to add or retrieve items in the cache using various techniques. So let's examine the three techniques I decided to test: Dictionary with mutex - Just your standard generic Dictionary with a simple lock construct on an internal object. Dictionary with ReaderWriterLockSlim - Same Dictionary, but now using a lock designed to let multiple readers access simultaneously and then locked when a writer needs access. ConcurrentDictionary - The new ConcurrentDictionary from System.Collections.Concurrent that is supposed to be optimized to allow multiple threads to access safely. So the approach to each of these is also fairly straight-forward. Let's look at the GetDelegate and AddDelegate implementations for the Dictionary with mutex lock: 1: var addDelegate = (key,val) => 2: { 3: lock (_mutex) 4: { 5: _dictionary[key] = val; 6: } 7: }; 8: var getDelegate = (key) => 9: { 10: lock (_mutex) 11: { 12: string val; 13: return _dictionary.TryGetValue(key, out val) ? val : null; 14: } 15: }; Nothing new or fancy here, just your basic lock on a private object and then query/insert into the Dictionary. Now, for the Dictionary with ReadWriteLockSlim it's a little more complex: 1: var addDelegate = (key,val) => 2: { 3: _readerWriterLock.EnterWriteLock(); 4: _dictionary[key] = val; 5: _readerWriterLock.ExitWriteLock(); 6: }; 7: var getDelegate = (key) => 8: { 9: string val; 10: _readerWriterLock.EnterReadLock(); 11: if(!_dictionary.TryGetValue(key, out val)) 12: { 13: val = null; 14: } 15: _readerWriterLock.ExitReadLock(); 16: return val; 17: }; And finally, the ConcurrentDictionary, which since it does all it's own concurrency control, is remarkably elegant and simple: 1: var addDelegate = (key,val) => 2: { 3: _concurrentDictionary[key] = val; 4: }; 5: var getDelegate = (key) => 6: { 7: string s; 8: return _concurrentDictionary.TryGetValue(key, out s) ? s : null; 9: }; Then, I set up a test harness that would simply ask the user for the number of concurrent Accessors to attempt to Access the cache (as specified in Accessor.Access() above) and then let them fly and see how long it took them all to complete. Each of these tests was run with 10,000,000 cache accesses divided among the available Accessor instances. All times are in milliseconds. 1: Dictionary with Mutex Locking 2: --------------------------------------------------- 3: Accessors Mostly Misses Mostly Hits 4: 1 7916 3285 5: 10 8293 3481 6: 100 8799 3532 7: 1000 8815 3584 8: 9: 10: Dictionary with ReaderWriterLockSlim Locking 11: --------------------------------------------------- 12: Accessors Mostly Misses Mostly Hits 13: 1 8445 3624 14: 10 11002 4119 15: 100 11076 3992 16: 1000 14794 4861 17: 18: 19: Concurrent Dictionary 20: --------------------------------------------------- 21: Accessors Mostly Misses Mostly Hits 22: 1 17443 3726 23: 10 14181 1897 24: 100 15141 1994 25: 1000 17209 2128 The first test I did across the board is the Mostly Misses category. The mostly misses (more adds because data requested was not in the dictionary) shows an interesting trend. In both cases the Dictionary with the simple mutex lock is much faster, and the ConcurrentDictionary is the slowest solution. But this got me thinking, and a little research seemed to confirm it, maybe the ConcurrentDictionary is more optimized to concurrent "gets" than "adds". So since the ratio of misses to hits were 2 to 1, I decided to reverse that and see the results. So I tweaked the data so that the number of keys were much smaller than the number of iterations to give me about a 2 to 1 ration of hits to misses (twice as likely to already find the item in the cache than to need to add it). And yes, indeed here we see that the ConcurrentDictionary is indeed faster than the standard Dictionary here. I have a strong feeling that as the ration of hits-to-misses gets higher and higher these number gets even better as well. This makes sense since the ConcurrentDictionary is read-optimized. Also note that I tried the tests with capacity and concurrency hints on the ConcurrentDictionary but saw very little improvement, I think this is largely because on the 10,000,000 hit test it quickly ramped up to the correct capacity and concurrency and thus the impact was limited to the first few milliseconds of the run. So what does this tell us? Well, as in all things, ConcurrentDictionary is not a panacea. It won't solve all your woes and it shouldn't be the only Dictionary you ever use. So when should we use each? Use System.Collections.Generic.Dictionary when: You need a single-threaded Dictionary (no locking needed). You need a multi-threaded Dictionary that is loaded only once at creation and never modified (no locking needed). You need a multi-threaded Dictionary to store items where writes are far more prevalent than reads (locking needed). And use System.Collections.Concurrent.ConcurrentDictionary when: You need a multi-threaded Dictionary where the writes are far more prevalent than reads. You need to be able to iterate over the collection without locking it even if its being modified. Both Dictionaries have their strong suits, I have a feeling this is just one where you need to know from design what you hope to use it for and make your decision based on that criteria.

Read the article
Good practices - database programming, unit testing

- by Piotr Rodak

Jason Brimhal wrote today on his blog that new book, Defensive Database Programming , written by Alex Kuznetsov ( blog ) is coming to bookstores. Alex writes about various techniques that make your code safer to run. SQL injection is not the only one vulnerability the code may be exposed to. Some other include inconsistent search patterns, unsupported character sets, locale settings, issues that may occur during high concurrency conditions, logic that breaks when certain conditions are not met. The...(read more)

Read the article
Structuring multi-threaded programs

- by davidk01

Are there any canonical sources for learning how to structure multi-threaded programs? Even with all the concurrency utility classes that Java provides I'm having a hard time properly structuring multi-threaded programs. Whenever threads are involved my code becomes very brittle, any little change can potentially break the program because the code that jumps back and forth between the threads tends to be very convoluted.

Read the article
Intel Experts Answer Your Parallel Programming and Multithreaded App Questions

If you have questions about application threading, memory management, synchronization, concurrency, or anything else related to parallel programming, who better to ask than the experts at Intel?

Read the article
Intel Experts Answer Your Parallel Programming and Multithreaded App Questions

If you have questions about application threading, memory management, synchronization, concurrency, or anything else related to parallel programming, who better to ask than the experts at Intel?

Read the article
How would you rank these programming skills in order of learning them? [closed]

- by mumtaz

As a general purpose programmer, what should you learn first and what should you learn later on? Here are some skills I wonder about... SQL Regular Expressions Multi-threading / Concurrency Functional Programming Graphics The mastery of your mother programming language's syntax/semantics/featureset The mastery of your base class framework libraries Version Control System Unit Testing XML Do you know other important ones? Please specify them... On which skills should I focus first?

Read the article
What is the diffference between "data hiding" and "encapsulation"?

- by john smith optional

I'm reading "Java concurrency in practice" and there is said: "Fortunately, the same object-oriented techniques that help you write well-organized, maintainable classes - such as encapsulation and data hiding -can also help you crate thread-safe classes." The problem #1 - I never heard about data hiding and don't know what it is. The problem #2 - I always thought that encapsulation is using private vs public, and is actually the data hiding. Can you please explain what data hiding is and how it differs from encapsulation?

Read the article
Do you need to know Java before trying Scala

- by gizgok

I'm interested in learning Scala. I've been reading a lot about it, but a lot of people value it because it has an actor model which is better for concurrency, it handles xml in a much better way, solves the problem of first class functions. My question is do you need to know Java to understand/appreciate the way things work in Scala? Is it better to first take a stab at Java and then try Scala or you can start Scala with absolutely no java backround?

Read the article
Is there a canonical resource on multi-tenancy web applications using ruby + rails

- by AlexC

Is there a canonical resource on multi-tenancy web applications using ruby + rails. There are a number of ways to develop rails apps using cloud capabilities with real elastic properties but there seems to be a lack of clarity with how to achieve multitenancy, specifically at the model / data level. Is there a canonical resource on options to developing multitenancy rails applications with the required characteristics of data seperation, security, concurrency and contention required by an enterprise level cloud application.

Read the article
parallel_for_each from amp.h – part 1

- by Daniel Moth

This posts assumes that you've read my other C++ AMP posts on index<N> and extent<N>, as well as about the restrict modifier. It also assumes you are familiar with C++ lambdas (if not, follow my links to C++ documentation). Basic structure and parameters Now we are ready for part 1 of the description of the new overload for the concurrency::parallel_for_each function. The basic new parallel_for_each method signature returns void and accepts two parameters: a grid<N> (think of it as an alias to extent) a restrict(direct3d) lambda, whose signature is such that it returns void and accepts an index of the same rank as the grid So it looks something like this (with generous returns for more palatable formatting) assuming we are dealing with a 2-dimensional space: // some_code_A parallel_for_each( g, // g is of type grid<2> [ ](index<2> idx) restrict(direct3d) { // kernel code } ); // some_code_B The parallel_for_each will execute the body of the lambda (which must have the restrict modifier), on the GPU. We also call the lambda body the "kernel". The kernel will be executed multiple times, once per scheduled GPU thread. The only difference in each execution is the value of the index object (aka as the GPU thread ID in this context) that gets passed to your kernel code. The number of GPU threads (and the values of each index) is determined by the grid object you pass, as described next. You know that grid is simply a wrapper on extent. In this context, one way to think about it is that the extent generates a number of index objects. So for the example above, if your grid was setup by some_code_A as follows: extent<2> e(2,3); grid<2> g(e); ...then given that: e.size()==6, e[0]==2, and e[1]=3 ...the six index<2> objects it generates (and hence the values that your lambda would receive) are: (0,0) (1,0) (0,1) (1,1) (0,2) (1,2) So what the above means is that the lambda body with the algorithm that you wrote will get executed 6 times and the index<2> object you receive each time will have one of the values just listed above (of course, each one will only appear once, the order is indeterminate, and they are likely to call your code at the same exact time). Obviously, in real GPU programming, you'd typically be scheduling thousands if not millions of threads, not just 6. If you've been following along you should be thinking: "that is all fine and makes sense, but what can I do in the kernel since I passed nothing else meaningful to it, and it is not returning any values out to me?" Passing data in and out It is a good question, and in data parallel algorithms indeed you typically want to pass some data in, perform some operation, and then typically return some results out. The way you pass data into the kernel, is by capturing variables in the lambda (again, if you are not familiar with them, follow the links about C++ lambdas), and the way you use data after the kernel is done executing is simply by using those same variables. In the example above, the lambda was written in a fairly useless way with an empty capture list: [ ](index<2> idx) restrict(direct3d), where the empty square brackets means that no variables were captured. If instead I write it like this [&](index<2> idx) restrict(direct3d), then all variables in the some_code_A region are made available to the lambda by reference, but as soon as I try to use any of those variables in the lambda, I will receive a compiler error. This has to do with one of the direct3d restrictions, where only one type can be capture by reference: objects of the new concurrency::array class that I'll introduce in the next post (suffice for now to think of it as a container of data). If I write the lambda line like this [=](index<2> idx) restrict(direct3d), all variables in the some_code_A region are made available to the lambda by value. This works for some types (e.g. an integer), but not for all, as per the restrictions for direct3d. In particular, no useful data classes work except for one new type we introduce with C++ AMP: objects of the new concurrency::array_view class, that I'll introduce in the post after next. Also note that if you capture some variable by value, you could use it as input to your algorithm, but you wouldn’t be able to observe changes to it after the parallel_for_each call (e.g. in some_code_B region since it was passed by value) – the exception to this rule is the array_view since (as we'll see in a future post) it is a wrapper for data, not a container. Finally, for completeness, you can write your lambda, e.g. like this [av, &ar](index<2> idx) restrict(direct3d) where av is a variable of type array_view and ar is a variable of type array - the point being you can be very specific about what variables you capture and how. So it looks like from a large data perspective you can only capture array and array_view objects in the lambda (that is how you pass data to your kernel) and then use the many threads that call your code (each with a unique index) to perform some operation. You can also capture some limited types by value, as input only. When the last thread completes execution of your lambda, the data in the array_view or array are ready to be used in the some_code_B region. We'll talk more about all this in future posts… (a)synchronous Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality. That's all for now, we'll revisit the parallel_for_each description, once we introduce properly array and array_view – coming next. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
What is the canonical resource on multi-tenancy web applications using ruby + rails

- by AlexC

What is the canonical resource on multi-tenancy web applications using ruby + rails. There are a number of ways to develop rails apps using cloud capabilities with real elastic properties but there seems to be a lack of clarity with how to achieve multitenancy, specifically at the model / data level. Is there a canonical resource on options to developing multitenancy rails applications with the required characteristics of data seperation, security, concurrency and contention required by an enterprise level cloud application.

Read the article
Are my actual worker threads exceeding the sp_configure 'max worker threads' value?

Tom Stringer (@SQLife) was working on some HADR testing for a customer to simulate many availability groups and introduce significant load into the system to measure overhead and such. In his quest to do that he was seeing behavior that he couldn’t really explain and so worked with him to uncover what was happening under the covers. Understand Locking, Blocking & Row VersioningRead Kalen Delaney's eBook to understand SQL Server concurrency, and use SQL Monitor to pinpoint excessive blocking and deadlocking. Download free resources.

Read the article
What is the difference between "data hiding" and "encapsulation"?

- by Software Engeneering Learner

I'm reading "Java concurrency in practice" and there is said: "Fortunately, the same object-oriented techniques that help you write well-organized, maintainable classes - such as encapsulation and data hiding -can also help you create thread-safe classes." The problem #1 - I never heard about data hiding and don't know what it is. The problem #2 - I always thought that encapsulation is using private vs public, and is actually the data hiding. Can you please explain what data hiding is and how it differs from encapsulation?

Read the article
Parallel Computing Features Tour in VS2010

Just realized that I have not linked from here to a screencast I recorded a couple weeks ago that shows the API, parallel debugger and concurrency visualizer in VS2010. Take a few minutes to watch the VS2010 Parallel Computing Features Tour. Comments about this post welcome at the original blog.

Read the article
The Java 7 Features Bound to Make Developers More Productive

The developer productivity features in the upcoming Java SE 7 release (object comparison, file system monitoring, and concurrency) are enough to make a Java veteran envy the Java newbies.

Read the article
The Java 7 Features Bound to Make Developers More Productive

The developer productivity features in the upcoming Java SE 7 release (object comparison, file system monitoring, and concurrency) are enough to make a Java veteran envy the Java newbies.

Read the article
NUMA-aware constructs for java.util.concurrent

- by Dave

The constructs in the java.util.concurrent JSR-166 "JUC" concurrency library are currently NUMA-oblivious. That's because we currently don't have the topology discovery infrastructure and underpinnings in place that would allow and enable NUMA-awareness. But some quick throw-away prototypes show that it's possible to write NUMA-aware library code. I happened to use JUC Exchanger as a research vehicle. Another interesting idea is to adapt fork-join work-stealing to favor stealing from queues associated with 'nearby' threads.

Read the article

Search Results

Search found 882 results on 36 pages for 'concurrency coordination'.

Page 19/36 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >

- by Zenofo

- by user18983

- by jean-pierre.dijcks

- by jpurdy

- by koncurrency

- by Simon Cooper

- by Guzba

- by Yuji Tomita

- by James Michael Hare

- by Piotr Rodak

- by davidk01

- by mumtaz

- by john smith optional

- by gizgok

- by AlexC

- by Daniel Moth

- by AlexC

- by Software Engeneering Learner

- by Dave

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >