Search Results

Search found 2207 results on 89 pages for 'nick locking'.

Page 88/89 | < Previous Page | 84 85 86 87 88 89 | Next Page >

C#/.NET Little Wonders: ConcurrentBag and BlockingCollection

- by James Michael Hare

In the first week of concurrent collections, began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>. The last post discussed the ConcurrentDictionary<T> . Finally this week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see C#/.NET Little Wonders: A Redux. Recap As you'll recall from the previous posts, the original collections were object-based containers that accomplished synchronization through a Synchronized member. With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization. With .NET 4.0, a new breed of collections was born in the System.Collections.Concurrent namespace. Of these, the final concurrent collection we will examine is the ConcurrentBag and a very useful wrapper class called the BlockingCollection. For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this informative whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentBag<T> – Thread-safe unordered collection. Unlike the other concurrent collections, the ConcurrentBag<T> has no non-concurrent counterpart in the .NET collections libraries. Items can be added and removed from a bag just like any other collection, but unlike the other collections, the items are not maintained in any order. This makes the bag handy for those cases when all you care about is that the data be consumed eventually, without regard for order of consumption or even fairness – that is, it’s possible new items could be consumed before older items given the right circumstances for a period of time. So why would you ever want a container that can be unfair? Well, to look at it another way, you can use a ConcurrentQueue and get the fairness, but it comes at a cost in that the ordering rules and synchronization required to maintain that ordering can affect scalability a bit. Thus sometimes the bag is great when you want the fastest way to get the next item to process, and don’t care what item it is or how long its been waiting. The way that the ConcurrentBag works is to take advantage of the new ThreadLocal<T> type (new in System.Threading for .NET 4.0) so that each thread using the bag has a list local to just that thread. This means that adding or removing to a thread-local list requires very low synchronization. The problem comes in where a thread goes to consume an item but it’s local list is empty. In this case the bag performs “work-stealing” where it will rob an item from another thread that has items in its list. This requires a higher level of synchronization which adds a bit of overhead to the take operation. So, as you can imagine, this makes the ConcurrentBag good for situations where each thread both produces and consumes items from the bag, but it would be less-than-idea in situations where some threads are dedicated producers and the other threads are dedicated consumers because the work-stealing synchronization would outweigh the thread-local optimization for a thread taking its own items. Like the other concurrent collections, there are some curiosities to keep in mind: IsEmpty(), Count, ToArray(), and GetEnumerator() lock collection Each of these needs to take a snapshot of whole bag to determine if empty, thus they tend to be more expensive and cause Add() and Take() operations to block. ToArray() and GetEnumerator() are static snapshots Because it is based on a snapshot, will not show subsequent updates after snapshot. Add() is lightweight Since adding to the thread-local list, there is very little overhead on Add. TryTake() is lightweight if items in thread-local list As long as items are in the thread-local list, TryTake() is very lightweight, much more so than ConcurrentStack() and ConcurrentQueue(), however if the local thread list is empty, it must steal work from another thread, which is more expensive. Remember, a bag is not ideal for all situations, it is mainly ideal for situations where a process consumes an item and either decomposes it into more items to be processed, or handles the item partially and places it back to be processed again until some point when it will complete. The main point is that the bag works best when each thread both takes and adds items. For example, we could create a totally contrived example where perhaps we want to see the largest power of a number before it crosses a certain threshold. Yes, obviously we could easily do this with a log function, but bare with me while I use this contrived example for simplicity. So let’s say we have a work function that will take a Tuple out of a bag, this Tuple will contain two ints. The first int is the original number, and the second int is the last multiple of that number. So we could load our bag with the initial values (let’s say we want to know the last multiple of each of 2, 3, 5, and 7 under 100. 1: var bag = new ConcurrentBag<Tuple<int, int>> 2: { 3: Tuple.Create(2, 1), 4: Tuple.Create(3, 1), 5: Tuple.Create(5, 1), 6: Tuple.Create(7, 1) 7: }; Then we can create a method that given the bag, will take out an item, apply the multiplier again, 1: public static void FindHighestPowerUnder(ConcurrentBag<Tuple<int,int>> bag, int threshold) 2: { 3: Tuple<int,int> pair; 4: 5: // while there are items to take, this will prefer local first, then steal if no local 6: while (bag.TryTake(out pair)) 7: { 8: // look at next power 9: var result = Math.Pow(pair.Item1, pair.Item2 + 1); 10: 11: if (result < threshold) 12: { 13: // if smaller than threshold bump power by 1 14: bag.Add(Tuple.Create(pair.Item1, pair.Item2 + 1)); 15: } 16: else 17: { 18: // otherwise, we're done 19: Console.WriteLine("Highest power of {0} under {3} is {0}^{1} = {2}.", 20: pair.Item1, pair.Item2, Math.Pow(pair.Item1, pair.Item2), threshold); 21: } 22: } 23: } Now that we have this, we can load up this method as an Action into our Tasks and run it: 1: // create array of tasks, start all, wait for all 2: var tasks = new[] 3: { 4: new Task(() => FindHighestPowerUnder(bag, 100)), 5: new Task(() => FindHighestPowerUnder(bag, 100)), 6: }; 7: 8: Array.ForEach(tasks, t => t.Start()); 9: 10: Task.WaitAll(tasks); Totally contrived, I know, but keep in mind the main point! When you have a thread or task that operates on an item, and then puts it back for further consumption – or decomposes an item into further sub-items to be processed – you should consider a ConcurrentBag as the thread-local lists will allow for quick processing. However, if you need ordering or if your processes are dedicated producers or consumers, this collection is not ideal. As with anything, you should performance test as your mileage will vary depending on your situation! BlockingCollection<T> – A producers & consumers pattern collection The BlockingCollection<T> can be treated like a collection in its own right, but in reality it adds a producers and consumers paradigm to any collection that implements the interface IProducerConsumerCollection<T>. If you don’t specify one at the time of construction, it will use a ConcurrentQueue<T> as its underlying store. If you don’t want to use the ConcurrentQueue, the ConcurrentStack and ConcurrentBag also implement the interface (though ConcurrentDictionary does not). In addition, you are of course free to create your own implementation of the interface. So, for those who don’t remember the producers and consumers classical computer-science problem, the gist of it is that you have one (or more) processes that are creating items (producers) and one (or more) processes that are consuming these items (consumers). Now, the crux of the problem is that there is a bin (queue) where the produced items are placed, and typically that bin has a limited size. Thus if a producer creates an item, but there is no space to store it, it must wait until an item is consumed. Also if a consumer goes to consume an item and none exists, it must wait until an item is produced. The BlockingCollection makes it trivial to implement any standard producers/consumers process set by providing that “bin” where the items can be produced into and consumed from with the appropriate blocking operations. In addition, you can specify whether the bin should have a limited size or can be (theoretically) unbounded, and you can specify timeouts on the blocking operations. As far as your choice of “bin”, for the most part the ConcurrentQueue is the right choice because it is fairly light and maximizes fairness by ordering items so that they are consumed in the same order they are produced. You can use the concurrent bag or stack, of course, but your ordering would be random-ish in the case of the former and LIFO in the case of the latter. So let’s look at some of the methods of note in BlockingCollection: BoundedCapacity returns capacity of the “bin” If the bin is unbounded, the capacity is int.MaxValue. Count returns an internally-kept count of items This makes it O(1), but if you modify underlying collection directly (not recommended) it is unreliable. CompleteAdding() is used to cut off further adds. This sets IsAddingCompleted and begins to wind down consumers once empty. IsAddingCompleted is true when producers are “done”. Once you are done producing, should complete the add process to alert consumers. IsCompleted is true when producers are “done” and “bin” is empty. Once you mark the producers done, and all items removed, this will be true. Add() is a blocking add to collection. If bin is full, will wait till space frees up Take() is a blocking remove from collection. If bin is empty, will wait until item is produced or adding is completed. GetConsumingEnumerable() is used to iterate and consume items. Unlike the standard enumerator, this one consumes the items instead of iteration. TryAdd() attempts add but does not block completely If adding would block, returns false instead, can specify TimeSpan to wait before stopping. TryTake() attempts to take but does not block completely Like TryAdd(), if taking would block, returns false instead, can specify TimeSpan to wait. Note the use of CompleteAdding() to signal the BlockingCollection that nothing else should be added. This means that any attempts to TryAdd() or Add() after marked completed will throw an InvalidOperationException. In addition, once adding is complete you can still continue to TryTake() and Take() until the bin is empty, and then Take() will throw the InvalidOperationException and TryTake() will return false. So let’s create a simple program to try this out. Let’s say that you have one process that will be producing items, but a slower consumer process that handles them. This gives us a chance to peek inside what happens when the bin is bounded (by default, the bin is NOT bounded). 1: var bin = new BlockingCollection<int>(5); Now, we create a method to produce items: 1: public static void ProduceItems(BlockingCollection<int> bin, int numToProduce) 2: { 3: for (int i = 0; i < numToProduce; i++) 4: { 5: // try for 10 ms to add an item 6: while (!bin.TryAdd(i, TimeSpan.FromMilliseconds(10))) 7: { 8: Console.WriteLine("Bin is full, retrying..."); 9: } 10: } 11: 12: // once done producing, call CompleteAdding() 13: Console.WriteLine("Adding is completed."); 14: bin.CompleteAdding(); 15: } And one to consume them: 1: public static void ConsumeItems(BlockingCollection<int> bin) 2: { 3: // This will only be true if CompleteAdding() was called AND the bin is empty. 4: while (!bin.IsCompleted) 5: { 6: int item; 7: 8: if (!bin.TryTake(out item, TimeSpan.FromMilliseconds(10))) 9: { 10: Console.WriteLine("Bin is empty, retrying..."); 11: } 12: else 13: { 14: Console.WriteLine("Consuming item {0}.", item); 15: Thread.Sleep(TimeSpan.FromMilliseconds(20)); 16: } 17: } 18: } Then we can fire them off: 1: // create one producer and two consumers 2: var tasks = new[] 3: { 4: new Task(() => ProduceItems(bin, 20)), 5: new Task(() => ConsumeItems(bin)), 6: new Task(() => ConsumeItems(bin)), 7: }; 8: 9: Array.ForEach(tasks, t => t.Start()); 10: 11: Task.WaitAll(tasks); Notice that the producer is faster than the consumer, thus it should be hitting a full bin often and displaying the message after it times out on TryAdd(). 1: Consuming item 0. 2: Consuming item 1. 3: Bin is full, retrying... 4: Bin is full, retrying... 5: Consuming item 3. 6: Consuming item 2. 7: Bin is full, retrying... 8: Consuming item 4. 9: Consuming item 5. 10: Bin is full, retrying... 11: Consuming item 6. 12: Consuming item 7. 13: Bin is full, retrying... 14: Consuming item 8. 15: Consuming item 9. 16: Bin is full, retrying... 17: Consuming item 10. 18: Consuming item 11. 19: Bin is full, retrying... 20: Consuming item 12. 21: Consuming item 13. 22: Bin is full, retrying... 23: Bin is full, retrying... 24: Consuming item 14. 25: Adding is completed. 26: Consuming item 15. 27: Consuming item 16. 28: Consuming item 17. 29: Consuming item 19. 30: Consuming item 18. Also notice that once CompleteAdding() is called and the bin is empty, the IsCompleted property returns true, and the consumers will exit. Summary The ConcurrentBag is an interesting collection that can be used to optimize concurrency scenarios where tasks or threads both produce and consume items. In this way, it will choose to consume its own work if available, and then steal if not. However, in situations where you want fair consumption or ordering, or in situations where the producers and consumers are distinct processes, the bag is not optimal. The BlockingCollection is a great wrapper around all of the concurrent queue, stack, and bag that allows you to add producer and consumer semantics easily including waiting when the bin is full or empty. That’s the end of my dive into the concurrent collections. I’d also strongly recommend, once again, you read this excellent Microsoft white paper that goes into much greater detail on the efficiencies you can gain using these collections judiciously (here). Tweet Technorati Tags: C#,.NET,Concurrent Collections,Little Wonders

Read the article
Plan Caching and Query Memory Part II (Hash Match) – When not to use stored procedure - Most common performance mistake SQL Server developers make.

- by sqlworkshops

SQL Server estimates Memory requirement at compile time, when stored procedure or other plan caching mechanisms like sp_executesql or prepared statement are used, the memory requirement is estimated based on first set of execution parameters. This is a common reason for spill over tempdb and hence poor performance. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Hash Match operations with examples. It is recommended to read Plan Caching and Query Memory Part I before this article which covers an introduction and Query memory for Sort. In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a query does not change significantly based on predicates. This article covers underestimation / overestimation of memory for Hash Match operation. Plan Caching and Query Memory Part I covers underestimation / overestimation for Sort. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance. To read additional articles I wrote click here. The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts Let’s create a Customer’s State table that has 99% of customers in NY and the rest 1% in WA.Customers table used in Part I of this article is also used here.To observe Hash Warning, enable 'Hash Warning' in SQL Profiler under Events 'Errors and Warnings'. --Example provided by www.sqlworkshops.com drop table CustomersState go create table CustomersState (CustomerID int primary key, Address char(200), State char(2)) go insert into CustomersState (CustomerID, Address) select CustomerID, 'Address' from Customers update CustomersState set State = 'NY' where CustomerID % 100 != 1 update CustomersState set State = 'WA' where CustomerID % 100 = 1 go update statistics CustomersState with fullscan go Let’s create a stored procedure that joins customers with CustomersState table with a predicate on State. --Example provided by www.sqlworkshops.com create proc CustomersByState @State char(2) as begin declare @CustomerID int select @CustomerID = e.CustomerID from Customers e inner join CustomersState es on (e.CustomerID = es.CustomerID) where es.State = @State option (maxdop 1) end go Let’s execute the stored procedure first with parameter value ‘WA’ – which will select 1% of data. set statistics time on go --Example provided by www.sqlworkshops.com exec CustomersByState 'WA' goThe stored procedure took 294 ms to complete. The stored procedure was granted 6704 KB based on 8000 rows being estimated. The estimated number of rows, 8000 is similar to actual number of rows 8000 and hence the memory estimation should be ok. There was no Hash Warning in SQL Profiler. To observe Hash Warning, enable 'Hash Warning' in SQL Profiler under Events 'Errors and Warnings'. Now let’s execute the stored procedure with parameter value ‘NY’ – which will select 99% of data. -Example provided by www.sqlworkshops.com exec CustomersByState 'NY' go The stored procedure took 2922 ms to complete. The stored procedure was granted 6704 KB based on 8000 rows being estimated. The estimated number of rows, 8000 is way different from the actual number of rows 792000 because the estimation is based on the first set of parameter value supplied to the stored procedure which is ‘WA’ in our case. This underestimation will lead to spill over tempdb, resulting in poor performance. There was Hash Warning (Recursion) in SQL Profiler. To observe Hash Warning, enable 'Hash Warning' in SQL Profiler under Events 'Errors and Warnings'. Let’s recompile the stored procedure and then let’s first execute the stored procedure with parameter value ‘NY’. In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts, www.sqlworkshops.com/webcasts for further details. exec sp_recompile CustomersByState go --Example provided by www.sqlworkshops.com exec CustomersByState 'NY' go Now the stored procedure took only 1046 ms instead of 2922 ms. The stored procedure was granted 146752 KB of memory. The estimated number of rows, 792000 is similar to actual number of rows of 792000. Better performance of this stored procedure execution is due to better estimation of memory and avoiding spill over tempdb. There was no Hash Warning in SQL Profiler. Now let’s execute the stored procedure with parameter value ‘WA’. --Example provided by www.sqlworkshops.com exec CustomersByState 'WA' go The stored procedure took 351 ms to complete, higher than the previous execution time of 294 ms. This stored procedure was granted more memory (146752 KB) than necessary (6704 KB) based on parameter value ‘NY’ for estimation (792000 rows) instead of parameter value ‘WA’ for estimation (8000 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is ‘NY’ in this case. This overestimation leads to poor performance of this Hash Match operation, it might also affect the performance of other concurrently executing queries requiring memory and hence overestimation is not recommended. The estimated number of rows, 792000 is much more than the actual number of rows of 8000. Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined data range.Let’s recreate the stored procedure with recompile hint. --Example provided by www.sqlworkshops.com drop proc CustomersByState go create proc CustomersByState @State char(2) as begin declare @CustomerID int select @CustomerID = e.CustomerID from Customers e inner join CustomersState es on (e.CustomerID = es.CustomerID) where es.State = @State option (maxdop 1, recompile) end go Let’s execute the stored procedure initially with parameter value ‘WA’ and then with parameter value ‘NY’. --Example provided by www.sqlworkshops.com exec CustomersByState 'WA' go exec CustomersByState 'NY' go The stored procedure took 297 ms and 1102 ms in line with previous optimal execution times. The stored procedure with parameter value ‘WA’ has good estimation like before. Estimated number of rows of 8000 is similar to actual number of rows of 8000. The stored procedure with parameter value ‘NY’ also has good estimation and memory grant like before because the stored procedure was recompiled with current set of parameter values. Estimated number of rows of 792000 is similar to actual number of rows of 792000. The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit. There was no Hash Warning in SQL Profiler. Let’s recreate the stored procedure with optimize for hint of ‘NY’. --Example provided by www.sqlworkshops.com drop proc CustomersByState go create proc CustomersByState @State char(2) as begin declare @CustomerID int select @CustomerID = e.CustomerID from Customers e inner join CustomersState es on (e.CustomerID = es.CustomerID) where es.State = @State option (maxdop 1, optimize for (@State = 'NY')) end go Let’s execute the stored procedure initially with parameter value ‘WA’ and then with parameter value ‘NY’. --Example provided by www.sqlworkshops.com exec CustomersByState 'WA' go exec CustomersByState 'NY' go The stored procedure took 353 ms with parameter value ‘WA’, this is much slower than the optimal execution time of 294 ms we observed previously. This is because of overestimation of memory. The stored procedure with parameter value ‘NY’ has optimal execution time like before. The stored procedure with parameter value ‘WA’ has overestimation of rows because of optimize for hint value of ‘NY’. Unlike before, more memory was estimated to this stored procedure based on optimize for hint value ‘NY’. The stored procedure with parameter value ‘NY’ has good estimation because of optimize for hint value of ‘NY’. Estimated number of rows of 792000 is similar to actual number of rows of 792000. Optimal amount memory was estimated to this stored procedure based on optimize for hint value ‘NY’. There was no Hash Warning in SQL Profiler. This article covers underestimation / overestimation of memory for Hash Match operation. Plan Caching and Query Memory Part I covers underestimation / overestimation for Sort. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance. Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case. I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts. Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here. Disclaimer and copyright information:This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. R Meyyappan [email protected] LinkedIn: http://at.linkedin.com/in/rmeyyappan

Read the article
What's up with OCFS2?

- by wcoekaer

On Linux there are many filesystem choices and even from Oracle we provide a number of filesystems, all with their own advantages and use cases. Customers often confuse ACFS with OCFS or OCFS2 which then causes assumptions to be made such as one replacing the other etc... I thought it would be good to write up a summary of how OCFS2 got to where it is, what we're up to still, how it is different from other options and how this really is a cool native Linux cluster filesystem that we worked on for many years and is still widely used. Work on a cluster filesystem at Oracle started many years ago, in the early 2000's when the Oracle Database Cluster development team wrote a cluster filesystem for Windows that was primarily focused on providing an alternative to raw disk devices and help customers with the deployment of Oracle Real Application Cluster (RAC). Oracle RAC is a cluster technology that lets us make a cluster of Oracle Database servers look like one big database. The RDBMS runs on many nodes and they all work on the same data. It's a Shared Disk database design. There are many advantages doing this but I will not go into detail as that is not the purpose of my write up. Suffice it to say that Oracle RAC expects all the database data to be visible in a consistent, coherent way, across all the nodes in the cluster. To do that, there were/are a few options : 1) use raw disk devices that are shared, through SCSI, FC, or iSCSI 2) use a network filesystem (NFS) 3) use a cluster filesystem(CFS) which basically gives you a filesystem that's coherent across all nodes using shared disks. It is sort of (but not quite) combining option 1 and 2 except that you don't do network access to the files, the files are effectively locally visible as if it was a local filesystem. So OCFS (Oracle Cluster FileSystem) on Windows was born. Since Linux was becoming a very important and popular platform, we decided that we would also make this available on Linux and thus the porting of OCFS/Windows started. The first version of OCFS was really primarily focused on replacing the use of Raw devices with a simple filesystem that lets you create files and provide direct IO to these files to get basically native raw disk performance. The filesystem was not designed to be fully POSIX compliant and it did not have any where near good/decent performance for regular file create/delete/access operations. Cache coherency was easy since it was basically always direct IO down to the disk device and this ensured that any time one issues a write() command it would go directly down to the disk, and not return until the write() was completed. Same for read() any sort of read from a datafile would be a read() operation that went all the way to disk and return. We did not cache any data when it came down to Oracle data files. So while OCFS worked well for that, since it did not have much of a normal filesystem feel, it was not something that could be submitted to the kernel mail list for inclusion into Linux as another native linux filesystem (setting aside the Windows porting code ...) it did its job well, it was very easy to configure, node membership was simple, locking was disk based (so very slow but it existed), you could create regular files and do regular filesystem operations to a certain extend but anything that was not database data file related was just not very useful in general. Logfiles ok, standard filesystem use, not so much. Up to this point, all the work was done, at Oracle, by Oracle developers. Once OCFS (1) was out for a while and there was a lot of use in the database RAC world, many customers wanted to do more and were asking for features that you'd expect in a normal native filesystem, a real "general purposes cluster filesystem". So the team sat down and basically started from scratch to implement what's now known as OCFS2 (Oracle Cluster FileSystem release 2). Some basic criteria were : Design it with a real Distributed Lock Manager and use the network for lock negotiation instead of the disk Make it a Linux native filesystem instead of a native shim layer and a portable core Support standard Posix compliancy and be fully cache coherent with all operations Support all the filesystem features Linux offers (ACL, extended Attributes, quotas, sparse files,...) Be modern, support large files, 32/64bit, journaling, data ordered journaling, endian neutral, we can mount on both endian /cross architecture,.. Needless to say, this was a huge development effort that took many years to complete. A few big milestones happened along the way... OCFS2 was development in the open, we did not have a private tree that we worked on without external code review from the Linux Filesystem maintainers, great folks like Christopher Hellwig reviewed the code regularly to make sure we were not doing anything out of line, we submitted the code for review on lkml a number of times to see if we were getting close for it to be included into the mainline kernel. Using this development model is standard practice for anyone that wants to write code that goes into the kernel and having any chance of doing so without a complete rewrite or.. shall I say flamefest when submitted. It saved us a tremendous amount of time by not having to re-fit code for it to be in a Linus acceptable state. Some other filesystems that were trying to get into the kernel that didn't follow an open development model had a lot harder time and a lot harsher criticism. March 2006, when Linus released 2.6.16, OCFS2 officially became part of the mainline kernel, it was accepted a little earlier in the release candidates but in 2.6.16. OCFS2 became officially part of the mainline Linux kernel tree as one of the many filesystems. It was the first cluster filesystem to make it into the kernel tree. Our hope was that it would then end up getting picked up by the distribution vendors to make it easy for everyone to have access to a CFS. Today the source code for OCFS2 is approximately 85000 lines of code. We made OCFS2 production with full support for customers that ran Oracle database on Linux, no extra or separate support contract needed. OCFS2 1.0.0 started being built for RHEL4 for x86, x86-64, ppc, s390x and ia64. For RHEL5 starting with OCFS2 1.2. SuSE was very interested in high availability and clustering and decided to build and include OCFS2 with SLES9 for their customers and was, next to Oracle, the main contributor to the filesystem for both new features and bug fixes. Source code was always available even prior to inclusion into mainline and as of 2.6.16, source code was just part of a Linux kernel download from kernel.org, which it still is, today. So the latest OCFS2 code is always the upstream mainline Linux kernel. OCFS2 is the cluster filesystem used in Oracle VM 2 and Oracle VM 3 as the virtual disk repository filesystem. Since the filesystem is in the Linux kernel it's released under the GPL v2 The release model has always been that new feature development happened in the mainline kernel and we then built consistent, well tested, snapshots that had versions, 1.2, 1.4, 1.6, 1.8. But these releases were effectively just snapshots in time that were tested for stability and release quality. OCFS2 is very easy to use, there's a simple text file that contains the node information (hostname, node number, cluster name) and a file that contains the cluster heartbeat timeouts. It is very small, and very efficient. As Sunil Mushran wrote in the manual : OCFS2 is an efficient, easily configured, quickly installed, fully integrated and compatible, feature-rich, architecture and endian neutral, cache coherent, ordered data journaling, POSIX-compliant, shared disk cluster file system. Here is a list of some of the important features that are included : Variable Block and Cluster sizes Supports block sizes ranging from 512 bytes to 4 KB and cluster sizes ranging from 4 KB to 1 MB (increments in power of 2). Extent-based Allocations Tracks the allocated space in ranges of clusters making it especially efficient for storing very large files. Optimized Allocations Supports sparse files, inline-data, unwritten extents, hole punching and allocation reservation for higher performance and efficient storage. File Cloning/snapshots REFLINK is a feature which introduces copy-on-write clones of files in a cluster coherent way. Indexed Directories Allows efficient access to millions of objects in a directory. Metadata Checksums Detects silent corruption in inodes and directories. Extended Attributes Supports attaching an unlimited number of name:value pairs to the file system objects like regular files, directories, symbolic links, etc. Advanced Security Supports POSIX ACLs and SELinux in addition to the traditional file access permission model. Quotas Supports user and group quotas. Journaling Supports both ordered and writeback data journaling modes to provide file system consistency in the event of power failure or system crash. Endian and Architecture neutral Supports a cluster of nodes with mixed architectures. Allows concurrent mounts on nodes running 32-bit and 64-bit, little-endian (x86, x86_64, ia64) and big-endian (ppc64) architectures. In-built Cluster-stack with DLM Includes an easy to configure, in-kernel cluster-stack with a distributed lock manager. Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os Supports all modes of I/Os for maximum flexibility and performance. Comprehensive Tools Support Provides a familiar EXT3-style tool-set that uses similar parameters for ease-of-use. The filesystem was distributed for Linux distributions in separate RPM form and this had to be built for every single kernel errata release or every updated kernel provided by the vendor. We provided builds from Oracle for Oracle Linux and all kernels released by Oracle and for Red Hat Enterprise Linux. SuSE provided the modules directly for every kernel they shipped. With the introduction of the Unbreakable Enterprise Kernel for Oracle Linux and our interest in reducing the overhead of building filesystem modules for every minor release, we decide to make OCFS2 available as part of UEK. There was no more need for separate kernel modules, everything was built-in and a kernel upgrade automatically updated the filesystem, as it should. UEK allowed us to not having to backport new upstream filesystem code into an older kernel version, backporting features into older versions introduces risk and requires extra testing because the code is basically partially rewritten. The UEK model works really well for continuing to provide OCFS2 without that extra overhead. Because the RHEL kernel did not contain OCFS2 as a kernel module (it is in the source tree but it is not built by the vendor in kernel module form) we stopped adding the extra packages to Oracle Linux and its RHEL compatible kernel and for RHEL. Oracle Linux customers/users obviously get OCFS2 included as part of the Unbreakable Enterprise Kernel, SuSE customers get it by SuSE distributed with SLES and Red Hat can decide to distribute OCFS2 to their customers if they chose to as it's just a matter of compiling the module and making it available. OCFS2 today, in the mainline kernel is pretty much feature complete in terms of integration with every filesystem feature Linux offers and it is still actively maintained with Joel Becker being the primary maintainer. Since we use OCFS2 as part of Oracle VM, we continue to look at interesting new functionality to add, REFLINK was a good example, and as such we continue to enhance the filesystem where it makes sense. Bugfixes and any sort of code that goes into the mainline Linux kernel that affects filesystems, automatically also modifies OCFS2 so it's in kernel, actively maintained but not a lot of new development happening at this time. We continue to fully support OCFS2 as part of Oracle Linux and the Unbreakable Enterprise Kernel and other vendors make their own decisions on support as it's really a Linux cluster filesystem now more than something that we provide to customers. It really just is part of Linux like EXT3 or BTRFS etc, the OS distribution vendors decide. Do not confuse OCFS2 with ACFS (ASM cluster Filesystem) also known as Oracle Cloud Filesystem. ACFS is a filesystem that's provided by Oracle on various OS platforms and really integrates into Oracle ASM (Automatic Storage Management). It's a very powerful Cluster Filesystem but it's not distributed as part of the Operating System, it's distributed with the Oracle Database product and installs with and lives inside Oracle ASM. ACFS obviously is fully supported on Linux (Oracle Linux, Red Hat Enterprise Linux) but OCFS2 independently as a native Linux filesystem is also, and continues to also be supported. ACFS is very much tied into the Oracle RDBMS, OCFS2 is just a standard native Linux filesystem with no ties into Oracle products. Customers running the Oracle database and ASM really should consider using ACFS as it also provides storage/clustered volume management. Customers wanting to use a simple, easy to use generic Linux cluster filesystem should consider using OCFS2. To learn more about OCFS2 in detail, you can find good documentation on http://oss.oracle.com/projects/ocfs2 in the Documentation area, or get the latest mainline kernel from http://kernel.org and read the source. One final, unrelated note - since I am not always able to publicly answer or respond to comments, I do not want to selectively publish comments from readers. Sometimes I forget to publish comments, sometime I publish them and sometimes I would publish them but if for some reason I cannot publicly comment on them, it becomes a very one-sided stream. So for now I am going to not publish comments from anyone, to be fair to all sides. You are always welcome to email me and I will do my best to respond to technical questions, questions about strategy or direction are sometimes not possible to answer for obvious reasons.

Read the article
Inside the Concurrent Collections: ConcurrentDictionary

- by Simon Cooper

Using locks to implement a thread-safe collection is rather like using a sledgehammer - unsubtle, easy to understand, and tends to make any other tool redundant. Unlike the previous two collections I looked at, ConcurrentStack and ConcurrentQueue, ConcurrentDictionary uses locks quite heavily. However, it is careful to wield locks only where necessary to ensure that concurrency is maximised. This will, by necessity, be a higher-level look than my other posts in this series, as there is quite a lot of code and logic in ConcurrentDictionary. Therefore, I do recommend that you have ConcurrentDictionary open in a decompiler to have a look at all the details that I skip over. The problem with locks There's several things to bear in mind when using locks, as encapsulated by the lock keyword in C# and the System.Threading.Monitor class in .NET (if you're unsure as to what lock does in C#, I briefly covered it in my first post in the series): Locks block threads The most obvious problem is that threads waiting on a lock can't do any work at all. No preparatory work, no 'optimistic' work like in ConcurrentQueue and ConcurrentStack, nothing. It sits there, waiting to be unblocked. This is bad if you're trying to maximise concurrency. Locks are slow Whereas most of the methods on the Interlocked class can be compiled down to a single CPU instruction, ensuring atomicity at the hardware level, taking out a lock requires some heavy lifting by the CLR and the operating system. There's quite a bit of work required to take out a lock, block other threads, and wake them up again. If locks are used heavily, this impacts performance. Deadlocks When using locks there's always the possibility of a deadlock - two threads, each holding a lock, each trying to aquire the other's lock. Fortunately, this can be avoided with careful programming and structured lock-taking, as we'll see. So, it's important to minimise where locks are used to maximise the concurrency and performance of the collection. Implementation As you might expect, ConcurrentDictionary is similar in basic implementation to the non-concurrent Dictionary, which I studied in a previous post. I'll be using some concepts introduced there, so I recommend you have a quick read of it. So, if you were implementing a thread-safe dictionary, what would you do? The naive implementation is to simply have a single lock around all methods accessing the dictionary. This would work, but doesn't allow much concurrency. Fortunately, the bucketing used by Dictionary allows a simple but effective improvement to this - one lock per bucket. This allows different threads modifying different buckets to do so in parallel. Any thread making changes to the contents of a bucket takes the lock for that bucket, ensuring those changes are thread-safe. The method that maps each bucket to a lock is the GetBucketAndLockNo method: private void GetBucketAndLockNo( int hashcode, out int bucketNo, out int lockNo, int bucketCount) { // the bucket number is the hashcode (without the initial sign bit) // modulo the number of buckets bucketNo = (hashcode & 0x7fffffff) % bucketCount; // and the lock number is the bucket number modulo the number of locks lockNo = bucketNo % m_locks.Length; } However, this does require some changes to how the buckets are implemented. The 'implicit' linked list within a single backing array used by the non-concurrent Dictionary adds a dependency between separate buckets, as every bucket uses the same backing array. Instead, ConcurrentDictionary uses a strict linked list on each bucket: This ensures that each bucket is entirely separate from all other buckets; adding or removing an item from a bucket is independent to any changes to other buckets. Modifying the dictionary All the operations on the dictionary follow the same basic pattern: void AlterBucket(TKey key, ...) { int bucketNo, lockNo; 1: GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, m_buckets.Length); 2: lock (m_locks[lockNo]) { 3: Node headNode = m_buckets[bucketNo]; 4: Mutate the node linked list as appropriate } } For example, when adding another entry to the dictionary, you would iterate through the linked list to check whether the key exists already, and add the new entry as the head node. When removing items, you would find the entry to remove (if it exists), and remove the node from the linked list. Adding, updating, and removing items all follow this pattern. Performance issues There is a problem we have to address at this point. If the number of buckets in the dictionary is fixed in the constructor, then the performance will degrade from O(1) to O(n) when a large number of items are added to the dictionary. As more and more items get added to the linked lists in each bucket, the lookup operations will spend most of their time traversing a linear linked list. To fix this, the buckets array has to be resized once the number of items in each bucket has gone over a certain limit. (In ConcurrentDictionary this limit is when the size of the largest bucket is greater than the number of buckets for each lock. This check is done at the end of the TryAddInternal method.) Resizing the bucket array and re-hashing everything affects every bucket in the collection. Therefore, this operation needs to take out every lock in the collection. Taking out mutiple locks at once inevitably summons the spectre of the deadlock; two threads each hold a lock, and each trying to acquire the other lock. How can we eliminate this? Simple - ensure that threads never try to 'swap' locks in this fashion. When taking out multiple locks, always take them out in the same order, and always take out all the locks you need before starting to release them. In ConcurrentDictionary, this is controlled by the AcquireLocks, AcquireAllLocks and ReleaseLocks methods. Locks are always taken out and released in the order they are in the m_locks array, and locks are all released right at the end of the method in a finally block. At this point, it's worth pointing out that the locks array is never re-assigned, even when the buckets array is increased in size. The number of locks is fixed in the constructor by the concurrencyLevel parameter. This simplifies programming the locks; you don't have to check if the locks array has changed or been re-assigned before taking out a lock object. And you can be sure that when a thread takes out a lock, another thread isn't going to re-assign the lock array. This would create a new series of lock objects, thus allowing another thread to ignore the existing locks (and any threads controlling them), breaking thread-safety. Consequences of growing the array Just because we're using locks doesn't mean that race conditions aren't a problem. We can see this by looking at the GrowTable method. The operation of this method can be boiled down to: private void GrowTable(Node[] buckets) { try { 1: Acquire first lock in the locks array // this causes any other thread trying to take out // all the locks to block because the first lock in the array // is always the one taken out first // check if another thread has already resized the buckets array // while we were waiting to acquire the first lock 2: if (buckets != m_buckets) return; 3: Calculate the new size of the backing array 4: Node[] array = new array[size]; 5: Acquire all the remaining locks 6: Re-hash the contents of the existing buckets into array 7: m_buckets = array; } finally { 8: Release all locks } } As you can see, there's already a check for a race condition at step 2, for the case when the GrowTable method is called twice in quick succession on two separate threads. One will successfully resize the buckets array (blocking the second in the meantime), when the second thread is unblocked it'll see that the array has already been resized & exit without doing anything. There is another case we need to consider; looking back at the AlterBucket method above, consider the following situation: Thread 1 calls AlterBucket; step 1 is executed to get the bucket and lock numbers. Thread 2 calls GrowTable and executes steps 1-5; thread 1 is blocked when it tries to take out the lock in step 2. Thread 2 re-hashes everything, re-assigns the buckets array, and releases all the locks (steps 6-8). Thread 1 is unblocked and continues executing, but the calculated bucket and lock numbers are no longer valid. Between calculating the correct bucket and lock number and taking out the lock, another thread has changed where everything is. Not exactly thread-safe. Well, a similar problem was solved in ConcurrentStack and ConcurrentQueue by storing a local copy of the state, doing the necessary calculations, then checking if that state is still valid. We can use a similar idea here: void AlterBucket(TKey key, ...) { while (true) { Node[] buckets = m_buckets; int bucketNo, lockNo; GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, buckets.Length); lock (m_locks[lockNo]) { // if the state has changed, go back to the start if (buckets != m_buckets) continue; Node headNode = m_buckets[bucketNo]; Mutate the node linked list as appropriate } break; } } TryGetValue and GetEnumerator And so, finally, we get onto TryGetValue and GetEnumerator. I've left these to the end because, well, they don't actually use any locks. How can this be? Whenever you change a bucket, you need to take out the corresponding lock, yes? Indeed you do. However, it is important to note that TryGetValue and GetEnumerator don't actually change anything. Just as immutable objects are, by definition, thread-safe, read-only operations don't need to take out a lock because they don't change anything. All lockless methods can happily iterate through the buckets and linked lists without worrying about locking anything. However, this does put restrictions on how the other methods operate. Because there could be another thread in the middle of reading the dictionary at any time (even if a lock is taken out), the dictionary has to be in a valid state at all times. Every change to state has to be made visible to other threads in a single atomic operation (all relevant variables are marked volatile to help with this). This restriction ensures that whatever the reading threads are doing, they never read the dictionary in an invalid state (eg items that should be in the collection temporarily removed from the linked list, or reading a node that has had it's key & value removed before the node itself has been removed from the linked list). Fortunately, all the operations needed to change the dictionary can be done in that way. Bucket resizes are made visible when the new array is assigned back to the m_buckets variable. Any additions or modifications to a node are done by creating a new node, then splicing it into the existing list using a single variable assignment. Node removals are simply done by re-assigning the node's m_next pointer. Because the dictionary can be changed by another thread during execution of the lockless methods, the GetEnumerator method is liable to return dirty reads - changes made to the dictionary after GetEnumerator was called, but before the enumeration got to that point in the dictionary. It's worth listing at this point which methods are lockless, and which take out all the locks in the dictionary to ensure they get a consistent view of the dictionary: Lockless: TryGetValue GetEnumerator The indexer getter ContainsKey Takes out every lock (lockfull?): Count IsEmpty Keys Values CopyTo ToArray Concurrent principles That covers the overall implementation of ConcurrentDictionary. I haven't even begun to scratch the surface of this sophisticated collection. That I leave to you. However, we've looked at enough to be able to extract some useful principles for concurrent programming: Partitioning When using locks, the work is partitioned into independant chunks, each with its own lock. Each partition can then be modified concurrently to other partitions. Ordered lock-taking When a method does need to control the entire collection, locks are taken and released in a fixed order to prevent deadlocks. Lockless reads Read operations that don't care about dirty reads don't take out any lock; the rest of the collection is implemented so that any reading thread always has a consistent view of the collection. That leads us to the final collection in this little series - ConcurrentBag. Lacking a non-concurrent analogy, it is quite different to any other collection in the class libraries. Prepare your thinking hats!

Read the article
CodePlex Daily Summary for Monday, June 11, 2012

CodePlex Daily Summary for Monday, June 11, 2012Popular ReleasesCasanova Language: Casanova IDE alpha release: This is the first release for the Casanova IDE. It features the major capabilities of the framework: support for rules, scripts, input management, and basic content management. The IDE is still under major development. Planned features include: multiplayer support 3D rendering syntax highlighting basic Intellisense slightly improved syntax for rules and scripts audio in-game menus Also, do not forget to download and install OpenAL: http://connect.creativelabs.com/openal/Download...Liberty: v3.2.1.0 Release 10th June 2012: Change Log -Added -Liberty is now digitally signed! If the certificate on Liberty.exe is missing, invalid, or does not state that it was developed by "Xbox Chaos, Open Source Developer," your copy of Liberty may have been altered in some (possibly malicious) way. -Reach Mass biped max health and shield changer -Fixed -H3/ODST Fixed all of the glitches that users kept reporting (also reverted the changes made in 3.2.0.2) -Reach Made some tag names clearer and more consistent between m...SVNUG.CodePlex: Cloud Development with Windows Azure: This release contains the slides for the Cloud Development with Windows Azure presentation.WCF Data Service (OData) Regression & Load Testing Tool: Latest: This is latest stable releaseSHA-1 Hash Checker: SHA-1 Hash Checker (for Windows): Fixed major bugs. Removed false negatives.AutoUpdaterdotNET: AutoUpdater.NET 1.0: Everything seems perfect if you find any problem you can report to http://www.rbsoft.org/contact.htmlMedia Companion: Media Companion 3.503b: It has been a while, so it's about time we release another build! Major effort has been for fixing trailer downloads, plus a little bit of work for episode guide tag in TV show NFOs.Microsoft SQL Server Product Samples: Database: AdventureWorks Sample Reports 2008 R2: AdventureWorks Sample Reports 2008 R2.zip contains several reports include Sales Reason Comparisons SQL2008R2.rdl which uses Adventure Works DW 2008R2 as a data source reference. For more information, go to Sales Reason Comparisons report.Json.NET: Json.NET 4.5 Release 7: Fix - Fixed Metro build to pass Windows Application Certification Kit on Windows 8 Release Preview Fix - Fixed Metro build error caused by an anonymous type Fix - Fixed ItemConverter not being used when serializing dictionaries Fix - Fixed an incorrect object being passed to the Error event when serializing dictionaries Fix - Fixed decimal properties not being correctly ignored with DefaultValueHandlingLINQ Extensions Library: 1.0.3.0: New to release 1.0.3.0:Combinatronics: Combinations (unique) Combinations (with repetition) Permutations (unique) Permutations (with repetition) Convert jagged arrays to fixed multidimensional arrays Convert fixed multidimensional arrays to jagged arrays ElementAtMax ElementAtMin ElementAtAverage New set of array extension (1.0.2.8):Rotate Flip Resize (maintaing data) Split Fuse Replace Append and Prepend extensions (1.0.2.7) IndexOf extensions (1.0.2.7) Ne...????????API for .Net SDK: SDK for .Net ??? Release 1: 6?11????? ??? - ?Entities???????????EntityBase，???ToString()???????json???，??????4.0???????。2.0?3.5???！ ??? - Request????????AccessToken??????source=appkey?????。????，????????，???????public_timeline?????????。 ?? - ???ClinetLogin??????????RefreshToken???????false???。 ?? - ???RepostTimeline????Statuses???null???。 ?? - Utility?BuildPostData?，?WeiboParameter??value?NULL????????。 ??????? ??? - ??.Net 2.0/3.5/4.0????。??????VS2010??????????。VS2008????????，??????????。 ??? - ??.Net 4.0???SDK...Audio Pitch & Shift: Audio Pitch And Shift 4.5.0: Added Instruments tab for modules Open folder content feature Some bug fixesPython Tools for Visual Studio: 1.5 Beta 1: We’re pleased to announce the release of Python Tools for Visual Studio 1.5 Beta. Python Tools for Visual Studio (PTVS) is an open-source plug-in for Visual Studio which supports programming with the Python language. PTVS supports a broad range of features including: • Supports CPython, IronPython, Jython and PyPy • Python editor with advanced member, signature intellisense and refactoring • Code navigation: “Find all refs”, goto definition, and object browser • Local and remote debugging •...Circuit Diagram: Circuit Diagram 2.0 Beta 1: New in this release: Automatically flip components when placing Delete components using keyboard delete key Resize document Document properties window Print document Recent files list Confirm when exiting with unsaved changes Thumbnail previews in Windows Explorer for CDDX files Show shortcut keys in toolbox Highlight selected item in toolbox Zoom using mouse scroll wheel while holding down ctrl key Plugin support for: Custom export formats Custom import formats Open...Umbraco CMS: Umbraco CMS 5.2 Beta: The future of Umbracov5 represents the future architecture of Umbraco, so please be aware that while it's technically superior to v4 it's not yet on a par feature or performance-wise. What's new? For full details see our http://progress.umbraco.org task tracking page showing all items complete for 5.2. In a nutshellPackage Builder Starter Kits Dynamic Extension Methods Querying / IsHelpers Friendly alt template URLs Localization Various bug fixes / performance enhancements Gett...JayData - The cross-platform HTML5 data-management library for JavaScript: JayData 1.0.5: JayData is a unified data access library for JavaScript developers to query and update data from different sources like WebSQL, IndexedDB, OData, Facebook or YQL. See it in action in this 6 minutes video New features in JayData 1.0.5http://jaydata.org/blog/jaydata-1.0.5-is-here-with-authentication-support-and-more http://jaydata.org/blog/release-notes Sencha Touch 2 module (read-only)This module can be used to bind data retrieved by JayData to Sencha Touch 2 generated user interface. (exam...32feet.NET: 3.5: This version changes the 32feet.NET library (both desktop and NETCF) to use .NET Framework version 3.5. Previously we compiled for .NET v2.0. There are no code changes from our version 3.4. See the 3.4 release for more information. Changes due to compiling for .NET 3.5Applications should be changed to use NET/NETCF v3.5. Removal of class InTheHand.Net.Bluetooth.AsyncCompletedEventArgs, which we provided on NETCF. We now just use the standard .NET System.ComponentModel.AsyncCompletedEvent...Application Architecture Guidelines: Application Architecture Guidelines 3.0.7: 3.0.7Jolt Environment: Jolt v2 Stable: Many new features. Follow development here for more information: http://www.rune-server.org/runescape-development/rs-503-client-server/projects/298763-jolt-environment-v2.html Setup instructions in downloadSharePoint Euro 2012 - UEFA European Football Predictor: havivi.euro2012.wsp (1.5): New fetures:Multilingual Support Max users property in Standings Web Part Games time zone change (UTC +1) bug fix - Version 1.4 locking problem http://euro2012.codeplex.com/discussions/358262 bug fix - Field Title not found (v.1.3) German SP http://euro2012.codeplex.com/discussions/358189#post844228 Bug fix - Access is denied.for users with contribute rights Bug fix - Installing on non-English version of SharePoint Bug fix - Title Rules Installing SharePoint Euro 2012 PredictorSharePoint E...New Projects2D map editor for Game Tool Development class: This project contains a basic 2D map editor, which can read a tileset (or chipset) to create a custom map. The user can then load and save maps previously created (file format .map). ArtifexCore: ArtifexCore - a compilation of unique, original, and revamped RunUO/OrbSA projects.ASP.NET MVC 4 - Sports Store using Visual Studio 2011 Beta: This is a the output of "Sports Store" exercise in Pro ASP.NET MVC 3 Framework by Adam Freeman and Steven Sanderson. Instead of MVC 3 as recommended in the book, I have used MVC 4.clabinet: clabinet is a cloud based file cabinetCopy File Location - Explorer Shortcut: This Explorer Extension adds a shortcut menu to all files and folders to copy the full location to the Clipboard.CSSSEVER: ???????????DoodleLabyrinthLogic: A starter '100 rogues' type logic library for rogue-like buildersEasyFlash Cart Builder: EasyFlash Cart Builder is a tool for linking files together into an EasyFlash cartridge. Generic enumeration: Provides string representation of C# enumerations.Hacker Typer for WP7: This is a hackertyper.net Windows Phone based application HTML Batch Logger: This is a simple Log class that can be used in any .NET C# Console Application. You can use it to log into an HTML file, console window, or database. kinect ????????????: ???、??????????????????????、????????????????。???、??????????????????。?????????、????????????????、??????????????????。laskjdfqewr131231: example omes projectnlite web libraray: Lite Web Framework,????Page，??Ndf???WebApiNMemory - an in-memory relational database for .NET: NMemory is a lightweight in-memory relational database engine that can be hosted by .NET applications. It supports traditional database features like indexes, foreign key relations, transaction handling and isolation.NoManaComponets: A attempt at a reusable library for common tasks in XNA like avatar management, text rendering and shading. Currently abandoned.NP: network client appObject Viewer: A UserControl that can display any object.PowerPoint Graph Creator: The main purpose of this PowerPoint add-in is to help people who sometimes need to draw graphs into the PPT and then for example add some animations or want to load to existing graphs into PPT but don't have a lot of time to re-draw every thing.Project Blue Tigris: This Project aims at enabling users to control and interact with Windows without the need to touch the keyboard or the mouse, through a new user interface using gestures and voice commands. This project will utilize webcam and microphones connected to the computer when installed. This is our vision. It would be great if you join us.QTP FT Uninstaller: KnowledgeInbox QTP/FT Uninstaller is a tool designed for uninstalling HP's QuickTest Professional or Functional Testing products in one click. The tool should only be used in cases where QTP was working fine earlier and after some update or installation it stopped working. The tool scans the system registry for all files associated with QTP and deletes them. Note: Using this uninstaller may impact tools like HP Sprinter, Quality Center. You should re-install those tools also after using t...Radius Client for Microsoft® .NET Micro Framework: Client for Remote Authentication Dial In User Service (RADIUS)Regression Suite: RegressionSuite is a software test suite that incorporates measurement of the startup lag, measurement of accurate execution times, generating execution statistics, customized input distributions, and processable regression specific details as part of the regular unit tests. Essentially, RegressionSuite provides the frame-work around which the individual unit regressors are invoked (and details and statistics collected). Unit regressors are grouped into named regressor sets (or modules), a...Sern: Nothing yet.SharePoint List Number to Text Custom Column: Number to text custom column in SharePoint is a project that is useful in any financial SharePoint implementation to automatically generate the corresponding number representation in textual format, this feature is designed to be easy extended to any language and it’s initially in Arabic and English languages.you can find all related source code in download section SharpDND: A attempt to create a open source DLL for the openSRD. Designed with customization in mind if completed the tools included could build out pathfinder and other RPG rulesets.SmartSense: SmartSense is a wearable holographic gesture and voice controlled intelligent system. Human-computer interaction (HCI) is a heavily researched area in today’s technology driven world. Most people experience HCI using a mouse and keyboard as input devices. We wanted a more natural way to interact with the computer that also allows instant access to information. We are developing a real-time system that is always on and available to collect data at any moment but also is accessible “on-the-fly...Sumzlib: sumzlib is a set of class library that provides useful algorithms in static method. this project is written in vb.netWCF Data Service (OData) Regression & Load Testing Tool: This is a tool that is especially being developed to regress OData Services. Current release only support few test like ($select, Top, etc ), more test will be added over the time. It is a Multithreaded Regression Test Tool that can generate result in Excel format including diagnostic error data and performance data like turnaround time as well. It can be used for bulk testing of several services at the same time It can be quite useful to for those who are developing several service...X.Web.Microdata: This project is intended to represent an mitcrodata entitie in the .NET Framework. (Particularly in ASP.NET) The X.Web.Microdata represent the http://www.data-vocabulary.org/ (and Google) microdata notation And X.Web.Microdata.SchemaOrg represent http://schema.org/ microdata notatio

Read the article
.NET Code Evolution

- by Alois Kraus

Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/07/24/153504.aspxAt my day job I do look at a lot of code written by other people. Most of the code is quite good and some is even a masterpiece. And there is also code which makes you think WTF… oh it was written by me. Hm not so bad after all. There are many excuses reasons for bad code. Most often it is time pressure followed by not enough ambition (who cares) or insufficient training. Normally I do care about code quality quite a lot which makes me a (perceived) slow worker who does write many tests and refines the code quite a lot because of the design deficiencies. Most of the deficiencies I do find by putting my design under stress while checking for invariants. It does also help a lot to step into the code with a debugger (sometimes also Windbg). I do this much more often when my tests are red. That way I do get a much better understanding what my code really does and not what I think it should be doing. This time I do want to show you how code can evolve over the years with different .NET Framework versions. Once there was time where .NET 1.1 was new and many C++ programmers did switch over to get rid of not initialized pointers and memory leaks. There were also nice new data structures available such as the Hashtable which is fast lookup table with O(1) time complexity. All was good and much code was written since then. At 2005 a new version of the .NET Framework did arrive which did bring many new things like generics and new data structures. The “old” fashioned way of Hashtable were coming to an end and everyone used the new Dictionary<xx,xx> type instead which was type safe and faster because the object to type conversion (aka boxing) was no longer necessary. I think 95% of all Hashtables and dictionaries use string as key. Often it is convenient to ignore casing to make it easy to look up values which the user did enter. An often followed route is to convert the string to upper case before putting it into the Hashtable. Hashtable Table = new Hashtable(); void Add(string key, string value) { Table.Add(key.ToUpper(), value); } This is valid and working code but it has problems. First we can pass to the Hashtable a custom IEqualityComparer to do the string matching case insensitive. Second we can switch over to the now also old Dictionary type to become a little faster and we can keep the the original keys (not upper cased) in the dictionary. Dictionary<string, string> DictTable = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase); void AddDict(string key, string value) { DictTable.Add(key, value); } Many people do not user the other ctors of Dictionary because they do shy away from the overhead of writing their own comparer. They do not know that .NET has for strings already predefined comparers at hand which you can directly use. Today in the many core area we do use threads all over the place. Sometimes things break in subtle ways but most of the time it is sufficient to place a lock around the offender. Threading has become so mainstream that it may sound weird that in the year 2000 some guy got a huge incentive for the idea to reduce the time to process calibration data from 12 hours to 6 hours by using two threads on a dual core machine. Threading does make it easy to become faster at the expense of correctness. Correct and scalable multithreading can be arbitrarily hard to achieve depending on the problem you are trying to solve. Lets suppose we want to process millions of items with two threads and count the processed items processed by all threads. A typical beginners code might look like this: int Counter; void IJustLearnedToUseThreads() { var t1 = new Thread(ThreadWorkMethod); t1.Start(); var t2 = new Thread(ThreadWorkMethod); t2.Start(); t1.Join(); t2.Join(); if (Counter != 2 * Increments) throw new Exception("Hmm " + Counter + " != " + 2 * Increments); } const int Increments = 10 * 1000 * 1000; void ThreadWorkMethod() { for (int i = 0; i < Increments; i++) { Counter++; } } It does throw an exception with the message e.g. “Hmm 10.222.287 != 20.000.000” and does never finish. The code does fail because the assumption that Counter++ is an atomic operation is wrong. The ++ operator is just a shortcut for Counter = Counter + 1 This does involve reading the counter from a memory location into the CPU, incrementing value on the CPU and writing the new value back to the memory location. When we do look at the generated assembly code we will see only inc dword ptr [ecx+10h] which is only one instruction. Yes it is one instruction but it is not atomic. All modern CPUs have several layers of caches (L1,L2,L3) which try to hide the fact how slow actual main memory accesses are. Since cache is just another word for redundant copy it can happen that one CPU does read a value from main memory into the cache, modifies it and write it back to the main memory. The problem is that at least the L1 cache is not shared between CPUs so it can happen that one CPU does make changes to values which did change in meantime in the main memory. From the exception you can see we did increment the value 20 million times but half of the changes were lost because we did overwrite the already changed value from the other thread. This is a very common case and people do learn to protect their data with proper locking. void Intermediate() { var time = Stopwatch.StartNew(); Action acc = ThreadWorkMethod_Intermediate; var ar1 = acc.BeginInvoke(null, null); var ar2 = acc.BeginInvoke(null, null); ar1.AsyncWaitHandle.WaitOne(); ar2.AsyncWaitHandle.WaitOne(); if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Intermediate did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Intermediate() { for (int i = 0; i < Increments; i++) { lock (this) { Counter++; } } } This is better and does use the .NET Threadpool to get rid of manual thread management. It does give the expected result but it can result in deadlocks because you do lock on this. This is in general a bad idea since it can lead to deadlocks when other threads use your class instance as lock object. It is therefore recommended to create a private object as lock object to ensure that nobody else can lock your lock object. When you read more about threading you will read about lock free algorithms. They are nice and can improve performance quite a lot but you need to pay close attention to the CLR memory model. It does make quite weak guarantees in general but it can still work because your CPU architecture does give you more invariants than the CLR memory model. For a simple counter there is an easy lock free alternative present with the Interlocked class in .NET. As a general rule you should not try to write lock free algos since most likely you will fail to get it right on all CPU architectures. void Experienced() { var time = Stopwatch.StartNew(); Task t1 = Task.Factory.StartNew(ThreadWorkMethod_Experienced); Task t2 = Task.Factory.StartNew(ThreadWorkMethod_Experienced); t1.Wait(); t2.Wait(); if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Experienced did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Experienced() { for (int i = 0; i < Increments; i++) { Interlocked.Increment(ref Counter); } } Since time does move forward we do not use threads explicitly anymore but the much nicer Task abstraction which was introduced with .NET 4 at 2010. It is educational to look at the generated assembly code. The Interlocked.Increment method must be called which does wondrous things right? Lets see: lock inc dword ptr [eax] The first thing to note that there is no method call at all. Why? Because the JIT compiler does know very well about CPU intrinsic functions. Atomic operations which do lock the memory bus to prevent other processors to read stale values are such things. Second: This is the same increment call prefixed with a lock instruction. The only reason for the existence of the Interlocked class is that the JIT compiler can compile it to the matching CPU intrinsic functions which can not only increment by one but can also do an add, exchange and a combined compare and exchange operation. But be warned that the correct usage of its methods can be tricky. If you try to be clever and look a the generated IL code and try to reason about its efficiency you will fail. Only the generated machine code counts. Is this the best code we can write? Perhaps. It is nice and clean. But can we make it any faster? Lets see how good we are doing currently. Level Time in s IJustLearnedToUseThreads Flawed Code Intermediate 1,5 (lock) Experienced 0,3 (Interlocked.Increment) Master 0,1 (1,0 for int[2]) That lock free thing is really a nice thing. But if you read more about CPU cache, cache coherency, false sharing you can do even better. int[] Counters = new int[12]; // Cache line size is 64 bytes on my machine with an 8 way associative cache try for yourself e.g. 64 on more modern CPUs void Master() { var time = Stopwatch.StartNew(); Task t1 = Task.Factory.StartNew(ThreadWorkMethod_Master, 0); Task t2 = Task.Factory.StartNew(ThreadWorkMethod_Master, Counters.Length - 1); t1.Wait(); t2.Wait(); Counter = Counters[0] + Counters[Counters.Length - 1]; if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Master did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Master(object number) { int index = (int) number; for (int i = 0; i < Increments; i++) { Counters[index]++; } } The key insight here is to use for each core its own value. But if you simply use simply an integer array of two items, one for each core and add the items at the end you will be much slower than the lock free version (factor 3). Each CPU core has its own cache line size which is something in the range of 16-256 bytes. When you do access a value from one location the CPU does not only fetch one value from main memory but a complete cache line (e.g. 16 bytes). This means that you do not pay for the next 15 bytes when you access them. This can lead to dramatic performance improvements and non obvious code which is faster although it does have many more memory reads than another algorithm. So what have we done here? We have started with correct code but it was lacking knowledge how to use the .NET Base Class Libraries optimally. Then we did try to get fancy and used threads for the first time and failed. Our next try was better but it still had non obvious issues (lock object exposed to the outside). Knowledge has increased further and we have found a lock free version of our counter which is a nice and clean way which is a perfectly valid solution. The last example is only here to show you how you can get most out of threading by paying close attention to your used data structures and CPU cache coherency. Although we are working in a virtual execution environment in a high level language with automatic memory management it does pay off to know the details down to the assembly level. Only if you continue to learn and to dig deeper you can come up with solutions no one else was even considering. I have studied particle physics which does help at the digging deeper part. Have you ever tried to solve Quantum Chromodynamics equations? Compared to that the rest must be easy ;-). Although I am no longer working in the Science field I take pride in discovering non obvious things. This can be a very hard to find bug or a new way to restructure data to make something 10 times faster. Now I need to get some sleep ….

Read the article
mysql: Bind on unix socket: Permission denied

- by Alex

Can't start mysql with: sudo /usr/bin/mysqld_safe --datadir=/srv/mysql/myDB --log-error=/srv/mysql/logs/mysqld-myDB.log --pid-file=/srv/mysql/pids/mysqld-myDB.pid --user=mysql --socket=/srv/mysql/sockets/mysql-myDB.sock --port=3700 120222 13:40:48 mysqld_safe Starting mysqld daemon with databases from /srv/mysql/myDB 120222 13:40:54 mysqld_safe mysqld from pid file /srv/mysql/pids/mysqld-myDB.pid ended /srv/mysql/logs/mysqld-myDB.log: 120222 13:43:53 mysqld_safe Starting mysqld daemon with databases from /srv/mysql/myDB 120222 13:43:53 [Note] Plugin 'FEDERATED' is disabled. /usr/sbin/mysqld: Table 'plugin' is read only 120222 13:43:53 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. 120222 13:43:53 InnoDB: Completed initialization of buffer pool 120222 13:43:53 InnoDB: Started; log sequence number 32 4232720908 120222 13:43:53 [ERROR] Can't start server : Bind on unix socket: Permission denied 120222 13:43:53 [ERROR] Do you already have another mysqld server running on socket: /srv/mysql/sockets/mysql-myDB.sock ? 120222 13:43:53 [ERROR] Aborting 120222 13:43:53 InnoDB: Starting shutdown... One instance mysqld is running: $ ps aux | grep mysql mysql 1093 0.0 0.2 169972 18700 ? Ssl 11:50 0:02 /usr/sbin/mysqld $ Port 3700 is available: $ netstat -a | grep 3700 $ Directory with sockets is empty: $ ls /srv/mysql/sockets/ $ There are all permissions: $ ls -l /srv/mysql/ total 20 drwxrwxrwx 2 mysql mysql 4096 2012-02-22 13:28 logs drwxrwxrwx 13 mysql mysql 4096 2012-02-22 13:44 myDB drwxrwxrwx 2 mysql mysql 4096 2012-02-22 12:55 pids drwxrwxrwx 2 mysql mysql 4096 2012-02-22 12:55 sockets drwxrwxrwx 2 mysql mysql 4096 2012-02-22 13:25 version Apparmor config: $cat /etc/apparmor.d/usr.sbin.mysqld # vim:syntax=apparmor # Last Modified: Tue Jun 19 17:37:30 2007 #include <tunables/global> /usr/sbin/mysqld flags=(complain) { #include <abstractions/base> #include <abstractions/nameservice> #include <abstractions/user-tmp> #include <abstractions/mysql> #include <abstractions/winbind> capability dac_override, capability sys_resource, capability setgid, capability setuid, network tcp, /etc/hosts.allow r, /etc/hosts.deny r, /etc/mysql/*.pem r, /etc/mysql/conf.d/ r, /etc/mysql/conf.d/* r, /etc/mysql/*.cnf r, /usr/lib/mysql/plugin/ r, /usr/lib/mysql/plugin/*.so* mr, /usr/sbin/mysqld mr, /usr/share/mysql/** r, /var/log/mysql.log rw, /var/log/mysql.err rw, /var/lib/mysql/ r, /var/lib/mysql/** rwk, /var/log/mysql/ r, /var/log/mysql/* rw, /{,var/}run/mysqld/mysqld.pid w, /{,var/}run/mysqld/mysqld.sock w, /srv/mysql/ r, /srv/mysql/** rwk, /sys/devices/system/cpu/ r, # Site-specific additions and overrides. See local/README for details. #include <local/usr.sbin.mysqld> } Any suggestions? UPD1: $ touch /srv/mysql/sockets/mysql-myDB.sock $ sudo chown mysql:mysql /srv/mysql/sockets/mysql-myDB.sock $ ls -l /srv/mysql/sockets/mysql-myDB.sock -rw-rw-r-- 1 mysql mysql 0 2012-02-22 14:29 /srv/mysql/sockets/mysql-myDB.sock $ sudo /usr/bin/mysqld_safe --datadir=/srv/mysql/myDB --log-error=/srv/mysql/logs/mysqld-myDB.log --pid-file=/srv/mysql/pids/mysqld-myDB.pid --user=mysql --socket=/srv/mysql/sockets/mysql-myDB.sock --port=3700 120222 14:30:18 mysqld_safe Can't log to error log and syslog at the same time. Remove all --log-error configuration options for --syslog to take effect. 120222 14:30:18 mysqld_safe Logging to '/srv/mysql/logs/mysqld-myDB.log'. 120222 14:30:18 mysqld_safe Starting mysqld daemon with databases from /srv/mysqlmyDB 120222 14:30:24 mysqld_safe mysqld from pid file /srv/mysql/pids/mysqld-myDB.pid ended $ ls -l /srv/mysql/sockets/mysql-myDB.sock ls: cannot access /srv/mysql/sockets/mysql-myDB.sock: No such file or directory $ UPD2: $ sudo netstat -lnp | grep mysql tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 1093/mysqld unix 2 [ ACC ] STREAM LISTENING 5912 1093/mysqld /var/run/mysqld/mysqld.sock $ sudo lsof | grep /srv/mysql/sockets/mysql-myDB.sock lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/sears/.gvfs Output information may be incomplete. UPD3: $ cat /etc/mysql/my.cnf # # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 127.0.0.1 # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #max_connections = 100 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 16M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. # As of 5.1 you can enable the log at runtime! #general_log_file = /var/log/mysql/mysql.log #general_log = 1 log_error = /var/log/mysql/error.log # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M #binlog_do_db = include_database_name #binlog_ignore_db = include_database_name # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/

Read the article
Backing up my data causes my server to crash using Symantec Backup Exec 12, or How I Came to Loathe

- by Kyle Noland

I have a Dell PowerEdge 2850 running Windows Server 2003. It is the primary file server for one of my clients. I have another server also running Windows Server 2003 that acts as the core media server for Symantec Backup Exec 12. I recently upgraded from Backup Exec 11d to 12. This upgrade was necessary because we also just upgraded from Exchange 2003 to Exchange 2007. After the upgrade I had to push-install the new version 12 Backup Exec Remote Agents to each of the servers I am backing up (about 6 total). 5 of my servers are doing just fine, faithfully completing backups every night. My file server routinely crashes. Observations: When the server crashes, it does not blue screen, it just locks up completely. Even the mouse is unresponsive. If you leave the server locked up long enough, it will eventually reboot itself and hang on the Windows splash screen. There is absolutely zero useful Event Viewer evidence of a problem. The logs go from routine logging to an Unexplained Shutdown Event the next morning when I have to hard reset the server to get it to boot. 90% of the time the server does not boot cleanly, it hangs on the Windows splash screen. I don't have any light to shed here. When the server hangs all I can do is hard reset it and try again. Even after a successful boot and chkdsk /r operation, if you reboot the machine, you have a 90% chance it won't back up again cleanly. The back story: This server started crashing during nightly backups about a month ago. I tried everything I could think of to troubleshoot the problem and eventually had to give up because I could not keep coming to the office at 4 AM to try to get the server back online. One Friday I got lucky and the server stayed up for its entire full backup. I took this opportunity to restore the full backup to a temporary server I set up and switched all my users to the temporary. Then I reloaded the ailing file server. I kept all my users on the temporary file server for about 3 weeks. I installed the same Backup Exec Remote Agent and Trend Micro A/V client on the temporary server that I was using on the regular file server. During this time, I had absolutely no problems backing up the temporary server. I tested the reloaded file server extensively. I rebooted the server once an hour every day for 3 weeks trying to make it fail. It never did. I felt confident that the reload was the answer to my problems. I moved all of the data from the temporary server back to the regular server. I got 3 nightly backups out of it before it locked up again and started the familiar failure to boot cleanly behavior. This weekend I decided to monitor the file server through the entire backup job. I RDPd into the file server and also into the server running Backup Exec. On the file server I opened the Task Manager so I could view the processes and watch CPU and memory usage. Everything was running smoothly for about 60GB worth of backup. Then I noticed that the byte count of the backup job in Backup Exec had stopped progressing. I looked back over at my RDP session into the file server, and I was getting real time updates about CPU and memory usage still - both nearly 0%, which is unusual. Backups usually hover around 40% usage for the duration of the backup job. Let me reiterate this point: The screen was refreshing and I was getting real time Task Manager updates - until I clicked on the Start menu. The screen went black and the server locked up. In truth, I think the server had already locked up, the video card just hadn't figured it out yet. I went back into my bag of trick: driving to the office and hard reseting the server over and over again when it hangs up at the Windows splash screen. I did this for 2 hours without getting a successful boot. I started panicking because I did not have a decent backup to use to get everything back onto the working temporary file server. Once I exhausted everything I knew to do, I took a deep breath, booted to the Windows Server 2003 CD and performed a repair installation of Windows. The server came back up fine, with all of my data intact. I can now reboot the server at will and it will come back up cleanly. The problem is that I'm afraid as soon as I try to back that data up again I will back at square one. So let me sum things up: Here is what I've done so far to troubleshoot this server: Deleted and recreated the RAID 5 sets. Initialized the drives. Reloaded the server with a fresh Server 2003 install. Confirmed with Dell that I have installed the latest, Dell approved BIOS and NIC drivers. Uninstalled / reinstalled the Backup Exec Remote Agent. Uninstalled the Trend Micro A/V client. Configured the server not to reboot itself after a blue screen so I can see any stop error. I used to think the server was blue screening, but since I enabled this setting I now know that the server just completely locks up. Run chkdsk /r from the Windows Recovery Console. Several errors were found and corrected, but did not help my problem. Help confirm or deny the following assumptions: There are two problems at work here. Why the server is locking up in the first place, and why the server won't boot cleanly after a lockup. This is ultimately a software problem. The server works fine and can be rebooted cleanly all day long - until the first lockup - following a fresh OS load or even a Repair installation. This is not a problem with Backup Exec in general. All of my other servers back up just fine. For the record, all of the other servers run Server 2003, and some of them house more data than the file server in question here. Any help is appreciated. The irony is almost too much to bear. Backing up my data is what is jeopardizing it.

Read the article
Varnish "FetchError no backend connection" error

- by clueless-anon

Varnishlog: 0 CLI - Rd ping 0 CLI - Wr 200 19 PONG 1340829925 1.0 12 SessionOpen c 79.124.74.11 3063 :80 12 SessionClose c EOF 12 StatSess c 79.124.74.11 3063 0 1 0 0 0 0 0 0 0 CLI - Rd ping 0 CLI - Wr 200 19 PONG 1340829928 1.0 0 CLI - Rd ping 0 CLI - Wr 200 19 PONG 1340829931 1.0 12 SessionOpen c 108.62.115.226 46211 :80 12 ReqStart c 108.62.115.226 46211 467185881 12 RxRequest c GET 12 RxURL c / 12 RxProtocol c HTTP/1.0 12 RxHeader c User-Agent: Pingdom.com_bot_version_1.4_(http://www.pingdom.com/) 12 RxHeader c Host: www.mysite.com 12 VCL_call c recv lookup 12 VCL_call c hash 12 Hash c / 12 Hash c www.mysite.com 12 VCL_return c hash 12 VCL_call c miss fetch 12 FetchError c no backend connection 12 VCL_call c error deliver 12 VCL_call c deliver deliver 12 TxProtocol c HTTP/1.1 12 TxStatus c 503 12 TxResponse c Service Unavailable 12 TxHeader c Server: Varnish 12 TxHeader c Content-Type: text/html; charset=utf-8 12 TxHeader c Retry-After: 5 12 TxHeader c Content-Length: 418 12 TxHeader c Accept-Ranges: bytes 12 TxHeader c Date: Wed, 27 Jun 2012 20:45:31 GMT 12 TxHeader c X-Varnish: 467185881 12 TxHeader c Age: 1 12 TxHeader c Via: 1.1 varnish 12 TxHeader c Connection: close 12 Length c 418 12 ReqEnd c 467185881 1340829931.192433119 1340829931.891024113 0.000051022 0.698516846 0.000074035 12 SessionClose c error 12 StatSess c 108.62.115.226 46211 1 1 1 0 0 0 256 418 0 CLI - Rd ping 0 CLI - Wr 200 19 PONG 1340829934 1.0 0 CLI - Rd ping 0 CLI - Wr 200 19 PONG 1340829937 1.0 netstat -tlnp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 3086/nginx tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1915/varnishd tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1279/sshd tcp 0 0 127.0.0.2:25 0.0.0.0:* LISTEN 3195/sendmail: MTA: tcp 0 0 127.0.0.2:6082 0.0.0.0:* LISTEN 1914/varnishd tcp 0 0 127.0.0.2:9000 0.0.0.0:* LISTEN 1317/php-fpm.conf) tcp 0 0 127.0.0.2:3306 0.0.0.0:* LISTEN 1192/mysqld tcp 0 0 127.0.0.2:587 0.0.0.0:* LISTEN 3195/sendmail: MTA: tcp 0 0 127.0.0.2:11211 0.0.0.0:* LISTEN 3072/memcached tcp6 0 0 :::8080 :::* LISTEN 3086/nginx tcp6 0 0 :::80 :::* LISTEN 1915/varnishd tcp6 0 0 :::22 :::* LISTEN 1279/sshd /etc/nginx/site-enabled/default server { listen 8080; ## listen for ipv4; this line is default and implied listen [::]:8080 default ipv6only=on; ## listen for ipv6 root /usr/share/nginx/www; index index.html index.htm index.php; # Make site accessible from http://localhost/ server_name localhost; location / { # First attempt to serve request as file, then # as directory, then fall back to index.html try_files $uri $uri/ /index.html; } location /doc { root /usr/share; autoindex on; allow 127.0.0.2; deny all; } location /images { root /usr/share; autoindex off; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # #error_page 500 502 503 504 /50x.html; #location = /50x.html { # root /usr/share/nginx/www; #} # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # location ~ \.php$ { fastcgi_pass 127.0.0.2:9000; fastcgi_index index.php; include fastcgi_params; } # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #} } /etc/nginx/sites-enabled/www.mysite.com.vhost server { listen 8080; server_name www.mysite.com mysite.com.net; root /var/www/www.mysite.com/web; if ($http_host != "www.mysite.com") { rewrite ^ http://www.mysite.com$request_uri permanent; } index index.php index.html; location = /favicon.ico { log_not_found off; access_log off; } location = /robots.txt { allow all; log_not_found off; access_log off; } # Deny all attempts to access hidden files such as .htaccess, .htpasswd, .DS_Store (Mac). location ~ /\. { deny all; access_log off; log_not_found off; } location / { try_files $uri $uri/ /index.php?$args; } # Add trailing slash to */wp-admin requests. rewrite /wp-admin$ $scheme://$host$uri/ permanent; location ~* \.(jpg|jpeg|png|gif|css|js|ico)$ { expires max; log_not_found off; } location ~ \.php$ { try_files $uri =404; include /etc/nginx/fastcgi_params; fastcgi_pass 127.0.0.2:9000; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; } include /var/www/www.mysite.com/web/nginx.conf; location ~ /nginx.conf { deny all; access_log off; log_not_found off; } } /etc/varnish/default.vcl # This is a basic VCL configuration file for varnish. See the vcl(7) # man page for details on VCL syntax and semantics. # # Default backend definition. Set this to point to your content # server. # backend default { .host = "127.0.0.2"; .port = "8080"; # .connect_timeout = 600s; #.first_byte_timeout = 600s; # .between_bytes_timeout = 600s; # .max_connections = 800; Note: uncommenting the last four options at default.vcl made no difference. cat /etc/default/varnish # Configuration file for varnish # # /etc/init.d/varnish expects the variables $DAEMON_OPTS, $NFILES and $MEMLOCK # to be set from this shell script fragment. # # Should we start varnishd at boot? Set to "yes" to enable. START=yes # Maximum number of open files (for ulimit -n) NFILES=131072 # Maximum locked memory size (for ulimit -l) # Used for locking the shared memory log in memory. If you increase log size, # you need to increase this number as well MEMLOCK=82000 # Default varnish instance name is the local nodename. Can be overridden with # the -n switch, to have more instances on a single server. INSTANCE=$(uname -n) # This file contains 4 alternatives, please use only one. ## Alternative 1, Minimal configuration, no VCL # # Listen on port 6081, administration on localhost:6082, and forward to # content server on localhost:8080. Use a 1GB fixed-size cache file. # # DAEMON_OPTS="-a :6081 \ # -T localhost:6082 \ # -b localhost:8080 \ # -u varnish -g varnish \ # -S /etc/varnish/secret \ # -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G" ## Alternative 2, Configuration with VCL # # Listen on port 6081, administration on localhost:6082, and forward to # one content server selected by the vcl file, based on the request. Use a 1GB # fixed-size cache file. # DAEMON_OPTS="-a :80 \ -T 127.0.0.2:6082 \ -f /etc/varnish/default.vcl \ -S /etc/varnish/secret \ -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G" If you need any other info let me know. I am all out of clue as to whats the problem.

Read the article
Backing up my data causes my server to crash using Symantec Backup Exec 12, or How I Came to Loathe Irony

- by Kyle Noland

I have a Dell PowerEdge 2850 running Windows Server 2003. It is the primary file server for one of my clients. I have another server also running Windows Server 2003 that acts as the core media server for Symantec Backup Exec 12. I recently upgraded from Backup Exec 11d to 12. This upgrade was necessary because we also just upgraded from Exchange 2003 to Exchange 2007. After the upgrade I had to push-install the new version 12 Backup Exec Remote Agents to each of the servers I am backing up (about 6 total). 5 of my servers are doing just fine, faithfully completing backups every night. My file server routinely crashes. Observations: When the server crashes, it does not blue screen, it just locks up completely. Even the mouse is unresponsive. If you leave the server locked up long enough, it will eventually reboot itself and hang on the Windows splash screen. There is absolutely zero useful Event Viewer evidence of a problem. The logs go from routine logging to an Unexplained Shutdown Event the next morning when I have to hard reset the server to get it to boot. 90% of the time the server does not boot cleanly, it hangs on the Windows splash screen. I don't have any light to shed here. When the server hangs all I can do is hard reset it and try again. Even after a successful boot and chkdsk /r operation, if you reboot the machine, you have a 90% chance it won't back up again cleanly. The back story: This server started crashing during nightly backups about a month ago. I tried everything I could think of to troubleshoot the problem and eventually had to give up because I could not keep coming to the office at 4 AM to try to get the server back online. One Friday I got lucky and the server stayed up for its entire full backup. I took this opportunity to restore the full backup to a temporary server I set up and switched all my users to the temporary. Then I reloaded the ailing file server. I kept all my users on the temporary file server for about 3 weeks. I installed the same Backup Exec Remote Agent and Trend Micro A/V client on the temporary server that I was using on the regular file server. During this time, I had absolutely no problems backing up the temporary server. I tested the reloaded file server extensively. I rebooted the server once an hour every day for 3 weeks trying to make it fail. It never did. I felt confident that the reload was the answer to my problems. I moved all of the data from the temporary server back to the regular server. I got 3 nightly backups out of it before it locked up again and started the familiar failure to boot cleanly behavior. This weekend I decided to monitor the file server through the entire backup job. I RDPd into the file server and also into the server running Backup Exec. On the file server I opened the Task Manager so I could view the processes and watch CPU and memory usage. Everything was running smoothly for about 60GB worth of backup. Then I noticed that the byte count of the backup job in Backup Exec had stopped progressing. I looked back over at my RDP session into the file server, and I was getting real time updates about CPU and memory usage still - both nearly 0%, which is unusual. Backups usually hover around 40% usage for the duration of the backup job. Let me reiterate this point: The screen was refreshing and I was getting real time Task Manager updates - until I clicked on the Start menu. The screen went black and the server locked up. In truth, I think the server had already locked up, the video card just hadn't figured it out yet. I went back into my bag of trick: driving to the office and hard reseting the server over and over again when it hangs up at the Windows splash screen. I did this for 2 hours without getting a successful boot. I started panicking because I did not have a decent backup to use to get everything back onto the working temporary file server. Once I exhausted everything I knew to do, I took a deep breath, booted to the Windows Server 2003 CD and performed a repair installation of Windows. The server came back up fine, with all of my data intact. I can now reboot the server at will and it will come back up cleanly. The problem is that I'm afraid as soon as I try to back that data up again I will back at square one. So let me sum things up: Here is what I've done so far to troubleshoot this server: Deleted and recreated the RAID 5 sets. Initialized the drives. Reloaded the server with a fresh Server 2003 install. Confirmed with Dell that I have installed the latest, Dell approved BIOS and NIC drivers. Uninstalled / reinstalled the Backup Exec Remote Agent. Uninstalled the Trend Micro A/V client. Configured the server not to reboot itself after a blue screen so I can see any stop error. I used to think the server was blue screening, but since I enabled this setting I now know that the server just completely locks up. Run chkdsk /r from the Windows Recovery Console. Several errors were found and corrected, but did not help my problem. Help confirm or deny the following assumptions: There are two problems at work here. Why the server is locking up in the first place, and why the server won't boot cleanly after a lockup. This is ultimately a software problem. The server works fine and can be rebooted cleanly all day long - until the first lockup - following a fresh OS load or even a Repair installation. This is not a problem with Backup Exec in general. All of my other servers back up just fine. For the record, all of the other servers run Server 2003, and some of them house more data than the file server in question here. Any help is appreciated. The irony is almost too much to bear. Backing up my data is what is jeopardizing it.

Read the article
Help with Design for Vacation Tracking System (C#/.NET/Access/WebServices/SOA/Excel) [closed]

- by Aaronaught

I have been tasked with developing a system for tracking our company's paid time-off (vacation, sick days, etc.) At the moment we are using an Excel spreadsheet on a shared network drive, and it works pretty well, but we are concerned that we won't be able to "trust" employees forever and sometimes we run into locking issues when two people try to open the spreadsheet at once. So we are trying to build something a little more robust. I would like some input on this design in terms of maintainability, scalability, extensibility, etc. It's a pretty simple workflow we need to represent right now: I started with a basic MS Access schema like this: Employees (EmpID int, EmpName varchar(50), AllowedDays int) Vacations (VacationID int, EmpID int, BeginDate datetime, EndDate datetime) But we don't want to spend a lot of time building a schema and database like this and have to change it later, so I think I am going to go with something that will be easier to expand through configuration. Right now the vacation table has this schema: Vacations (VacationID int, PropName varchar(50), PropValue varchar(50)) And the table will be populated with data like this: VacationID | PropName | PropValue -----------+--------------+------------------ 1 | EmpID | 4 1 | EmpName | James Jones 1 | Reason | Vacation 1 | BeginDate | 2/24/2010 1 | EndDate | 2/30/2010 1 | Destination | Spectate Swamp 2 | ... | ... I think this is a pretty good, extensible design, we can easily add new properties to the vacation like the destination or maybe approval status, etc. I wasn't too sure how to go about managing the database of valid properties, I thought of putting them in a separate PropNames table but it gets complicated to manage all the different data types and people say that you shouldn't put CLR type names into a SQL database, so I decided to use XML instead, here is the schema: <VacationProperties> <PropertyNames>EmpID,EmpName,Reason,BeginDate,EndDate,Destination</PropertyNames> <PropertyTypes>System.Int32,System.String,System.String,System.DateTime,System.DateTime,System.String</PropertyTypes> <PropertiesRequired>true,true,false,true,true,false</PropertiesRequired> </VacationProperties> I might need more fields than that, I'm not completely sure. I'm parsing the XML like this (would like some feedback on the parsing code): string xml = File.ReadAllText("properties.xml"); Match m = Regex.Match(xml, "<(PropertyNames)>(.*?)</PropertyNames>"; string[] pn = m.Value.Split(','); // do the same for PropertyTypes, PropertiesRequired Then I use the following code to persist configuration changes to the database: string sql = "DROP TABLE VacationProperties"; sql = sql + " CREATE TABLE VacationProperties "; sql = sql + "(PropertyName varchar(100), PropertyType varchar(100) "; sql = sql + "IsRequired varchar(100))"; for (int i = 0; i < pn.Length; i++) { sql = sql + " INSERT VacationProperties VALUES (" + pn[i] + "," + pt[i] + "," + pv[i] + ")"; } // GlobalConnection is a singleton new SqlCommand(sql, GlobalConnection.Instance).ExecuteReader(); So far so good, but after a few days of this I then realized that a lot of this was just a more specific kind of a generic workflow which could be further abstracted, and instead of writing all of this boilerplate plumbing code I could just come up with a workflow and plug it into a workflow engine like Windows Workflow Foundation and have the users configure it: In order to support routing these configurations throw the workflow system, it seemed natural to implement generic XML Web Services for this instead of just using an XML file as above. I've used this code to implement the Web Services: public class VacationConfigurationService : WebService { [WebMethod] public void UpdateConfiguration(string xml) { // Above code goes here } } Which was pretty easy, although I'm still working on a way to validate that XML against some kind of schema as there's no error-checking yet. I also created a few different services for other operations like VacationSubmissionService, VacationReportService, VacationDataService, VacationAuthenticationService, etc. The whole Service Oriented Architecture looks like this: And because the workflow itself might change, I have been working on a way to integrate the WF workflow system with MS Visio, which everybody at the office already knows how to use so they could make changes pretty easily. We have a diagram that looks like the following (it's kind of hard to read but the main items are Activities, Authenticators, Validators, Transformers, Processors, and Data Connections, they're all analogous to the services in the SOA diagram above). The requirements for this system are: (Note - I don't control these, they were given to me by management) Main workflow must interface with Excel spreadsheet, probably through VBA macros (to ease the transition to the new system) Alerts should integrate with MS Outlook, Lotus Notes, and SMS (text messages). We also want to interface it with the company Voice Mail system but that is not a "hard" requirement. Performance requirements: Must handle 250,000 Transactions Per Second Should be able to handle up to 20,000 employees (right now we have 3) 99.99% uptime ("four nines") expected Must be secure against outside hacking, but users cannot be required to enter a username/password. Platforms: Must support Windows XP/Vista/7, Linux, iPhone, Blackberry, DOS 2.0, VAX, IRIX, PDP-11, Apple IIc. Time to complete: 6 to 8 weeks. My questions are: Is this a good design for the system so far? Am I using all of the recommended best practices for these technologies? How do I integrate the Visio diagram above with the Windows Workflow Foundation to call the ConfigurationService and persist workflow changes? Am I missing any important components? Will this be extensible enough to support any scenario via end-user configuration? Will the system scale to the above performance requirements? Will we need any expensive hardware to run it? Are there any "gotchas" I should know about with respect to cross-platform compatibility? For example would it be difficult to convert this to an iPhone app? How long would you expect this to take? (We've dedicated 1 week for testing so I'm thinking maybe 5 weeks?) Many thanks for your advices, Aaron

Read the article
Design for Vacation Tracking System

- by Aaronaught

I have been tasked with developing a system for tracking our company's paid time-off (vacation, sick days, etc.) At the moment we are using an Excel spreadsheet on a shared network drive, and it works pretty well, but we are concerned that we won't be able to "trust" employees forever and sometimes we run into locking issues when two people try to open the spreadsheet at once. So we are trying to build something a little more robust. I would like some input on this design in terms of maintainability, scalability, extensibility, etc. It's a pretty simple workflow we need to represent right now: I started with a basic MS Access schema like this: Employees (EmpID int, EmpName varchar(50), AllowedDays int) Vacations (VacationID int, EmpID int, BeginDate datetime, EndDate datetime) But we don't want to spend a lot of time building a schema and database like this and have to change it later, so I think I am going to go with something that will be easier to expand through configuration. Right now the vacation table has this schema: Vacations (VacationID int, PropName varchar(50), PropValue varchar(50)) And the table will be populated with data like this: VacationID | PropName | PropValue -----------+--------------+------------------ 1 | EmpID | 4 1 | EmpName | James Jones 1 | Reason | Vacation 1 | BeginDate | 2/24/2010 1 | EndDate | 2/30/2010 1 | Destination | Spectate Swamp 2 | ... | ... I think this is a pretty good, extensible design, we can easily add new properties to the vacation like the destination or maybe approval status, etc. I wasn't too sure how to go about managing the database of valid properties, I thought of putting them in a separate PropNames table but it gets complicated to manage all the different data types and people say that you shouldn't put CLR type names into a SQL database, so I decided to use XML instead, here is the schema: <VacationProperties> <PropertyNames>EmpID,EmpName,Reason,BeginDate,EndDate,Destination</PropertyNames> <PropertyTypes>System.Int32,System.String,System.String,System.DateTime,System.DateTime,System.String</PropertyTypes> <PropertiesRequired>true,true,false,true,true,false</PropertiesRequired> </VacationProperties> I might need more fields than that, I'm not completely sure. I'm parsing the XML like this (would like some feedback on the parsing code): string xml = File.ReadAllText("properties.xml"); Match m = Regex.Match(xml, "<(PropertyNames)>(.*?)</PropertyNames>"; string[] pn = m.Value.Split(','); // do the same for PropertyTypes, PropertiesRequired Then I use the following code to persist configuration changes to the database: string sql = "DROP TABLE VacationProperties"; sql = sql + " CREATE TABLE VacationProperties "; sql = sql + "(PropertyName varchar(100), PropertyType varchar(100) "; sql = sql + "IsRequired varchar(100))"; for (int i = 0; i < pn.Length; i++) { sql = sql + " INSERT VacationProperties VALUES (" + pn[i] + "," + pt[i] + "," + pv[i] + ")"; } // GlobalConnection is a singleton new SqlCommand(sql, GlobalConnection.Instance).ExecuteReader(); So far so good, but after a few days of this I then realized that a lot of this was just a more specific kind of a generic workflow which could be further abstracted, and instead of writing all of this boilerplate plumbing code I could just come up with a workflow and plug it into a workflow engine like Windows Workflow Foundation and have the users configure it: In order to support routing these configurations throw the workflow system, it seemed natural to implement generic XML Web Services for this instead of just using an XML file as above. I've used this code to implement the Web Services: public class VacationConfigurationService : WebService { [WebMethod] public void UpdateConfiguration(string xml) { // Above code goes here } } Which was pretty easy, although I'm still working on a way to validate that XML against some kind of schema as there's no error-checking yet. I also created a few different services for other operations like VacationSubmissionService, VacationReportService, VacationDataService, VacationAuthenticationService, etc. The whole Service Oriented Architecture looks like this: And because the workflow itself might change, I have been working on a way to integrate the WF workflow system with MS Visio, which everybody at the office already knows how to use so they could make changes pretty easily. We have a diagram that looks like the following (it's kind of hard to read but the main items are Activities, Authenticators, Validators, Transformers, Processors, and Data Connections, they're all analogous to the services in the SOA diagram above). The requirements for this system are: (Note - I don't control these, they were given to me by management) Main workflow must interface with Excel spreadsheet, probably through VBA macros (to ease the transition to the new system) Alerts should integrate with MS Outlook, Lotus Notes, and SMS (text messages). We also want to interface it with the company Voice Mail system but that is not a "hard" requirement. Performance requirements: Must handle 250,000 Transactions Per Second Should be able to handle up to 20,000 employees (right now we have 3) 99.99% uptime ("four nines") expected Must be secure against outside hacking, but users cannot be required to enter a username/password. Platforms: Must support Windows XP/Vista/7, Linux, iPhone, Blackberry, DOS 2.0, VAX, IRIX, PDP-11, Apple IIc. Time to complete: 6 to 8 weeks. My questions are: Is this a good design for the system so far? Am I using all of the recommended best practices for these technologies? How do I integrate the Visio diagram above with the Windows Workflow Foundation to call the ConfigurationService and persist workflow changes? Am I missing any important components? Will this be extensible enough to support any scenario via end-user configuration? Will the system scale to the above performance requirements? Will we need any expensive hardware to run it? Are there any "gotchas" I should know about with respect to cross-platform compatibility? For example would it be difficult to convert this to an iPhone app? How long would you expect this to take? (We've dedicated 1 week for testing so I'm thinking maybe 5 weeks?)

Read the article
What is the right tool to detect VMT or heap corruption in Delphi ?

- by Roland Bengtsson

I'm a member in a team that use Delphi 2007 for a larger application and we suspect heap corruption because sometimes there are strange bugs that have no other explanation. I believe that the Rangechecking option for the compiler is only for arrays. I want a tool that give an exception or log when there is a write on a memory address that is not allocated by the application. Regards EDIT: The error is of type: Error: Access violation at address 00404E78 in module 'BoatLogisticsAMCAttracsServer.exe'. Read of address FFFFFFDD EDIT2: Thanks for all suggestions. Unfortunately I think that the solution is deeper than that. We use a patched version of Bold for Delphi as we own the source. Probably there are some errors introduced in the Bold framwork. Yes we have a log with callstacks that are handled by JCL and also trace messages. So a callstack with the exception can lock like this: 20091210 16:02:29 (2356) [EXCEPTION] Raised EBold: Failed to derive ServerSession.mayDropSession: Boolean OCL expression: not active and not idle and timeout and (ApplicationKernel.allinstances->first.CurrentSession <> self) Error: Access violation at address 00404E78 in module 'BoatLogisticsAMCAttracsServer.exe'. Read of address FFFFFFDD. At Location BoldSystem.TBoldMember.CalculateDerivedMemberWithExpression (BoldSystem.pas:4016) Inner Exception Raised EBold: Failed to derive ServerSession.mayDropSession: Boolean OCL expression: not active and not idle and timeout and (ApplicationKernel.allinstances->first.CurrentSession <> self) Error: Access violation at address 00404E78 in module 'BoatLogisticsAMCAttracsServer.exe'. Read of address FFFFFFDD. At Location BoldSystem.TBoldMember.CalculateDerivedMemberWithExpression (BoldSystem.pas:4016) Inner Exception Call Stack: [00] System.TObject.InheritsFrom (sys\system.pas:9237) Call Stack: [00] BoldSystem.TBoldMember.CalculateDerivedMemberWithExpression (BoldSystem.pas:4016) [01] BoldSystem.TBoldMember.DeriveMember (BoldSystem.pas:3846) [02] BoldSystem.TBoldMemberDeriver.DoDeriveAndSubscribe (BoldSystem.pas:7491) [03] BoldDeriver.TBoldAbstractDeriver.DeriveAndSubscribe (BoldDeriver.pas:180) [04] BoldDeriver.TBoldAbstractDeriver.SetDeriverState (BoldDeriver.pas:262) [05] BoldDeriver.TBoldAbstractDeriver.Derive (BoldDeriver.pas:117) [06] BoldDeriver.TBoldAbstractDeriver.EnsureCurrent (BoldDeriver.pas:196) [07] BoldSystem.TBoldMember.EnsureContentsCurrent (BoldSystem.pas:4245) [08] BoldSystem.TBoldAttribute.EnsureNotNull (BoldSystem.pas:4813) [09] BoldAttributes.TBABoolean.GetAsBoolean (BoldAttributes.pas:3069) [10] BusinessClasses.TLogonSession._GetMayDropSession (code\BusinessClasses.pas:31854) [11] DMAttracsTimers.TAttracsTimerDataModule.RemoveDanglingLogonSessions (code\DMAttracsTimers.pas:237) [12] DMAttracsTimers.TAttracsTimerDataModule.UpdateServerTimeOnTimerTrig (code\DMAttracsTimers.pas:482) [13] DMAttracsTimers.TAttracsTimerDataModule.TimerKernelWork (code\DMAttracsTimers.pas:551) [14] DMAttracsTimers.TAttracsTimerDataModule.AttracsTimerTimer (code\DMAttracsTimers.pas:600) [15] ExtCtrls.TTimer.Timer (ExtCtrls.pas:2281) [16] Classes.StdWndProc (common\Classes.pas:11583) The inner exception part is the callstack at the moment an exception is reraised. EDIT3: The theory right now is that the Virtual Memory Table (VMT) is somehow broken. When this happen there is no indication of it. Only when a method is called an exception is raised (ALWAYS on address FFFFFFDD, -35 decimal) but then it is too late. You don't know the real cause for the error. Any hint of how to catch a bug like this is really appreciated!!! We have tried with SafeMM, but the problem is that the memory consumption is too high even when the 3 GB flag is used. So now I try to give a bounty to the SO community :) EDIT4: One hint is that according the log there is often (or even always) another exception before this. It can be for example optimistic locking in the database. We have tried to raise exceptions by force but in test environment it just works fine. EDIT5: Story continues... I did a search on the logs for the last 30 days now. The result: "Read of address FFFFFFDB" 0 "Read of address FFFFFFDC" 24 "Read of address FFFFFFDD" 270 "Read of address FFFFFFDE" 22 "Read of address FFFFFFDF" 7 "Read of address FFFFFFE0" 20 "Read of address FFFFFFE1" 0 So the current theory is that an enum (there is a lots in Bold) overwrite a pointer. I got 5 hits with different address above. It could mean that the enum holds 5 values where the second one is most used. If there is an exception a rollback should occur for the database and Boldobjects should be destroyed. Maybe there is a chance that not everything is destroyed and a enum still can write to an address location. If this is true maybe it is possible to search the code by a regexpr for an enum with 5 values ? EDIT6: To summarize, no there is no solution to the problem yet. I realize that I may mislead you a bit with the callstack. Yes there are a timer in that but there are other callstacks without a timer. Sorry for that. But there are 2 common factors. An exception with Read of address FFFFFFxx. Top of callstack is System.TObject.InheritsFrom (sys\system.pas:9237) This convince me that VilleK best describe the problem. I'm also convinced that the problem is somewhere in the Bold framework. But the BIG question is, how can problems like this be solved ? It is not enough to have an Assert like VilleK suggest as the damage has already happened and the callstack is gone at that moment. So to describe my view of what may cause the error: Somewhere a pointer is assigned a bad value 1, but it can be also 0, 2, 3 etc. An object is assigned to that pointer. There is method call in the objects baseclass. This cause method TObject.InheritsForm to be called and an exception appear on address FFFFFFDD. Those 3 events can be together in the code but they may also be used much later. I think this is true for the last method call. EDIT7: We work closely with the the author of Bold Jan Norden and he recently found a bug in the OCL-evaluator in Bold framework. When this was fixed these kinds of exceptions decreased a lot but they still occasionally come. But it is a big relief that this is almost solved.

Read the article
Lock statement vs Monitor.Enter method.

- by Vokinneberg

I suppose it is an interesting code example. We have a class, let's call it Test with Finalize method. In Main method here is two code blocks where i am using lock statement and Monitor.Enter call. Also i have two instances of class Test here. The experiment is pretty simple - nulling Test variable within locking block and try to collect it manually with GC.Collect method call. So, to see the Finilaze call i am calling GC.WaitForPendingFinalizers method. Everything is very simple as you can see. By defenition of lock statement it's opens by compiler to try{...}finally{..} block with Minitor.Enter call inside of try block and Monitor.Exit in finally block. I've tryed to implement try-finally block manually. I've expected the same behaviour in both cases. in case of using lock and in case of unsing Monitor.Enter. But, surprize, surprize - it is different as you can see below. public class Test : IDisposable { private string name; public Test(string name) { this.name = name; } ~Test() { Console.WriteLine(string.Format("Finalizing class name {0}.", name)); } } class Program { static void Main(string[] args) { var test1 = new Test("Test1"); var test2 = new Test("Tesst2"); lock (test1) { test1 = null; Console.WriteLine("Manual collect 1."); GC.Collect(); GC.WaitForPendingFinalizers(); Console.WriteLine("Manual collect 2."); GC.Collect(); } var lockTaken = false; System.Threading.Monitor.Enter(test2, ref lockTaken); try { test2 = null; Console.WriteLine("Manual collect 3."); GC.Collect(); GC.WaitForPendingFinalizers(); Console.WriteLine("Manual collect 4."); GC.Collect(); } finally { System.Threading.Monitor.Exit(test2); } Console.ReadLine(); } } Output of this example is Manual collect 1. Manual collect 2. Manual collect 3. Finalizing class name Test2. Manual collect 4. And null reference exception in last finally block because test2 is null reference. I've was surprised and disasembly my code into IL. So, here is IL dump of Main method. .entrypoint .maxstack 2 .locals init ( [0] class ConsoleApplication2.Test test1, [1] class ConsoleApplication2.Test test2, [2] bool lockTaken, [3] bool <>s__LockTaken0, [4] class ConsoleApplication2.Test CS$2$0000, [5] bool CS$4$0001) L_0000: nop L_0001: ldstr "Test1" L_0006: newobj instance void ConsoleApplication2.Test::.ctor(string) L_000b: stloc.0 L_000c: ldstr "Tesst2" L_0011: newobj instance void ConsoleApplication2.Test::.ctor(string) L_0016: stloc.1 L_0017: ldc.i4.0 L_0018: stloc.3 L_0019: ldloc.0 L_001a: dup L_001b: stloc.s CS$2$0000 L_001d: ldloca.s <>s__LockTaken0 L_001f: call void [mscorlib]System.Threading.Monitor::Enter(object, bool&) L_0024: nop L_0025: nop L_0026: ldnull L_0027: stloc.0 L_0028: ldstr "Manual collect." L_002d: call void [mscorlib]System.Console::WriteLine(string) L_0032: nop L_0033: call void [mscorlib]System.GC::Collect() L_0038: nop L_0039: call void [mscorlib]System.GC::WaitForPendingFinalizers() L_003e: nop L_003f: ldstr "Manual collect." L_0044: call void [mscorlib]System.Console::WriteLine(string) L_0049: nop L_004a: call void [mscorlib]System.GC::Collect() L_004f: nop L_0050: nop L_0051: leave.s L_0066 L_0053: ldloc.3 L_0054: ldc.i4.0 L_0055: ceq L_0057: stloc.s CS$4$0001 L_0059: ldloc.s CS$4$0001 L_005b: brtrue.s L_0065 L_005d: ldloc.s CS$2$0000 L_005f: call void [mscorlib]System.Threading.Monitor::Exit(object) L_0064: nop L_0065: endfinally L_0066: nop L_0067: ldc.i4.0 L_0068: stloc.2 L_0069: ldloc.1 L_006a: ldloca.s lockTaken L_006c: call void [mscorlib]System.Threading.Monitor::Enter(object, bool&) L_0071: nop L_0072: nop L_0073: ldnull L_0074: stloc.1 L_0075: ldstr "Manual collect." L_007a: call void [mscorlib]System.Console::WriteLine(string) L_007f: nop L_0080: call void [mscorlib]System.GC::Collect() L_0085: nop L_0086: call void [mscorlib]System.GC::WaitForPendingFinalizers() L_008b: nop L_008c: ldstr "Manual collect." L_0091: call void [mscorlib]System.Console::WriteLine(string) L_0096: nop L_0097: call void [mscorlib]System.GC::Collect() L_009c: nop L_009d: nop L_009e: leave.s L_00aa L_00a0: nop L_00a1: ldloc.1 L_00a2: call void [mscorlib]System.Threading.Monitor::Exit(object) L_00a7: nop L_00a8: nop L_00a9: endfinally L_00aa: nop L_00ab: call string [mscorlib]System.Console::ReadLine() L_00b0: pop L_00b1: ret .try L_0019 to L_0053 finally handler L_0053 to L_0066 .try L_0072 to L_00a0 finally handler L_00a0 to L_00aa I does not see any difference between lock statement and Monitor.Enter call. So, why i steel have a reference to the instance of test1 in case of lock, and object is not collected by GC, but in case of using Monitor.Enter it is collected and finilized?

Read the article
Why are my Opteron cores running at only 75% capacity each? (25% CPU idle)

- by Tim Cooper

We've just taken delivery of a powerful 32-core AMD Opteron server with 128Gb. We have 2 x 6272 CPU's with 16 cores each. We are running a big long-running java task on 30 threads. We have the NUMA optimisations for Linux and java turned on. Our Java threads are mainly using objects that are private to that thread, sometimes reading memory that other threads will be reading, and very very occasionally writing or locking shared objects. We can't explain why the CPU cores are 25% idle. Below is a dump of "top": top - 23:06:38 up 1 day, 23 min, 3 users, load average: 10.84, 10.27, 9.62 Tasks: 676 total, 1 running, 675 sleeping, 0 stopped, 0 zombie Cpu(s): 64.5%us, 1.3%sy, 0.0%ni, 32.9%id, 1.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132138168k total, 131652664k used, 485504k free, 92340k buffers Swap: 5701624k total, 230252k used, 5471372k free, 13444344k cached ... top - 22:37:39 up 23:54, 3 users, load average: 7.83, 8.70, 9.27 Tasks: 678 total, 1 running, 677 sleeping, 0 stopped, 0 zombie Cpu0 : 75.8%us, 2.0%sy, 0.0%ni, 22.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 77.2%us, 1.3%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 77.3%us, 1.0%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 77.8%us, 1.0%sy, 0.0%ni, 21.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 76.9%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 76.3%us, 2.0%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 12.6%us, 3.0%sy, 0.0%ni, 84.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 8.6%us, 2.0%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 77.0%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 77.0%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 77.6%us, 1.7%sy, 0.0%ni, 20.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 75.7%us, 2.0%sy, 0.0%ni, 21.4%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 76.6%us, 2.3%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 76.6%us, 2.3%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 76.2%us, 2.6%sy, 0.0%ni, 15.9%id, 5.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 76.6%us, 2.0%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu16 : 73.6%us, 2.6%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu17 : 74.5%us, 2.3%sy, 0.0%ni, 23.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu18 : 73.9%us, 2.3%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu19 : 72.9%us, 2.6%sy, 0.0%ni, 24.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu20 : 72.8%us, 2.6%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu21 : 72.7%us, 2.3%sy, 0.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu22 : 72.5%us, 2.6%sy, 0.0%ni, 24.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu23 : 73.0%us, 2.3%sy, 0.0%ni, 24.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu24 : 74.7%us, 2.7%sy, 0.0%ni, 22.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu25 : 74.5%us, 2.6%sy, 0.0%ni, 22.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu26 : 73.7%us, 2.0%sy, 0.0%ni, 24.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu27 : 74.1%us, 2.3%sy, 0.0%ni, 23.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu28 : 74.1%us, 2.3%sy, 0.0%ni, 23.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu29 : 74.0%us, 2.0%sy, 0.0%ni, 24.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu30 : 73.2%us, 2.3%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu31 : 73.1%us, 2.0%sy, 0.0%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132138168k total, 131711704k used, 426464k free, 88336k buffers Swap: 5701624k total, 229572k used, 5472052k free, 13745596k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13865 root 20 0 122g 112g 3.1g S 2334.3 89.6 20726:49 java 27139 jayen 20 0 15428 1728 952 S 2.6 0.0 0:04.21 top 27161 sysadmin 20 0 15428 1712 940 R 1.0 0.0 0:00.28 top 33 root 20 0 0 0 0 S 0.3 0.0 0:06.24 ksoftirqd/7 131 root 20 0 0 0 0 S 0.3 0.0 0:09.52 events/0 1858 root 20 0 0 0 0 S 0.3 0.0 1:35.14 kondemand/0 A dump of the java stack confirms that none of the threads are anywhere near the few places where locks are used, nor are they anywhere near any disk or network i/o. I had trouble finding a clear explanation of what 'top' means by "idle" versus "wait", but I get the impression that "idle" means "no more threads that need to be run" but this doesn't make sense in our case. We're using a "Executors.newFixedThreadPool(30)". There are a large number of tasks pending and each task lasts for 10 seconds or so. I suspect that the explanation requires a good understanding of NUMA. Is the "idle" state what you see when a CPU is waiting for a non-local access? If not, then what is the explanation?

Read the article
In a SQL XDL File, how do I read the waitresource attribute on process nodes which are deadlocking?

- by skimania

On SQL Server 2005, I'm getting a deadlock when updating two different keys in the same table. note from below that these two waitresources have the same beginning part, but different ending parts. waitresource="KEY: 6:72057594090487808 (d900ed5a6cc6)" and waitresource="KEY: 6:72057594090487808 (d900fb5261bb)" These two keys are locking, and I need to figure out why. The question: If the values in parenthesis are different, why are the first half of the key's the same? <deadlock-list> <deadlock victim="processffffffff8f5863e8"> <process-list> <process id="processaf02f8" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (d900fb5261bb)" waittime="2281" ownerId="1370264705" transactionname="user_transaction" lasttranstarted="2010-06-17T00:35:25.483" XDES="0x69453a70" lockMode="U" schedulerid="3" kpid="7624" status="suspended" spid="339" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-17T00:35:25.483" lastbatchcompleted="2010-06-17T00:35:25.483" clientapp=".Net SqlClient Data Provider" hostname="RISKBBG_VM" hostpid="5848" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1370264705" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056"> <executionStack> <frame procname="MKP_RISKDB.dbo.MarketDataCurrentRtUpload" line="14" stmtstart="840" stmtend="1220" sqlhandle="0x03000600005f9d24c8878f00849d00000100000000000000"> UPDATE c WITH (ROWLOCK) SET LastUpdate = t.LastUpdate, Value = t.Value, Source = t.Source FROM MarketDataCurrent c INNER JOIN #TEMPTABLE2 t ON c.MDID = t.mdid; -- Insert new MDID </frame> <frame procname="adhoc" line="1" sqlhandle="0x010006004a58132228bf8d73000000000000000000000000"> MarketDataCurrentBlbgRtUpload </frame> </executionStack> <inputbuf> MarketDataCurrentBlbgRtUpload </inputbuf> </process> <process id="processffffffff8f5863e8" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (d900ed5a6cc6)" waittime="2281" ownerId="1370264646" transactionname="user_transaction" lasttranstarted="2010-06-17T00:35:25.450" XDES="0x1cb72be8" lockMode="U" schedulerid="5" kpid="1880" status="suspended" spid="287" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-17T00:35:25.450" lastbatchcompleted="2010-06-17T00:35:25.450" clientapp=".Net SqlClient Data Provider" hostname="RISKAPPS_VM" hostpid="1424" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1370264646" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056"> <executionStack> <frame procname="MKP_RISKDB.dbo.MarketDataCurrent_BulkUpload" line="28" stmtstart="1062" stmtend="1720" sqlhandle="0x03000600a28e5e4ef4fd8e00849d00000100000000000000"> UPDATE c WITH (ROWLOCK) SET LastUpdate = getdate(), Value = t.Value, Source = @source FROM MarketDataCurrent c INNER JOIN #MDTUP t ON c.MDID = t.mdid WHERE c.lastUpdate < @updateTime and c.mdid not in (select mdid from MarketData where BloombergTicker is not null and PriceSource like 'Live.%') and c.value <> t.value </frame> <frame procname="adhoc" line="1" stmtstart="88" sqlhandle="0x01000600c1653d0598706ca7000000000000000000000000"> exec MarketDataCurrent_BulkUpload @clearBefore, @source </frame> <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000"> unknown </frame> </executionStack> <inputbuf> (@clearBefore datetime,@source nvarchar(10))exec MarketDataCurrent_BulkUpload @clearBefore, @source </inputbuf> </process> </process-list> <resource-list> <keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock64ac7940" mode="U" associatedObjectId="72057594090487808"> <owner-list> <owner id="processffffffff8f5863e8" mode="U"/> </owner-list> <waiter-list> <waiter id="processaf02f8" mode="U" requestType="wait"/> </waiter-list> </keylock> <keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lockffffffffb8d2dd40" mode="U" associatedObjectId="72057594090487808"> <owner-list> <owner id="processaf02f8" mode="U"/> </owner-list> <waiter-list> <waiter id="processffffffff8f5863e8" mode="U" requestType="wait"/> </waiter-list> </keylock> </resource-list> </deadlock> </deadlock-list>

Read the article
Plan Caching and Query Memory Part I – When not to use stored procedure or other plan caching mechanisms like sp_executesql or prepared statement

- by sqlworkshops

The most common performance mistake SQL Server developers make: SQL Server estimates memory requirement for queries at compilation time. This mechanism is fine for dynamic queries that need memory, but not for queries that cache the plan. With dynamic queries the plan is not reused for different set of parameters values / predicates and hence different amount of memory can be estimated based on different set of parameter values / predicates. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Sort with examples. It is recommended to read Plan Caching and Query Memory Part II after this article which covers Hash Match operations. When the plan is cached by using stored procedure or other plan caching mechanisms like sp_executesql or prepared statement, SQL Server estimates memory requirement based on first set of execution parameters. Later when the same stored procedure is called with different set of parameter values, the same amount of memory is used to execute the stored procedure. This might lead to underestimation / overestimation of memory on plan reuse, overestimation of memory might not be a noticeable issue for Sort operations, but underestimation of memory will lead to spill over tempdb resulting in poor performance. This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance. To read additional articles I wrote click here. In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a stored procedure does not change significantly based on predicates. The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts Enough theory, let’s see an example where we sort initially 1 month of data and then use the stored procedure to sort 6 months of data. Let’s create a stored procedure that sorts customers by name within certain date range. --Example provided by www.sqlworkshops.com create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as begin declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c where c.CreationDate between @CreationDateFrom and @CreationDateTo order by c.CustomerName option (maxdop 1) end go Let’s execute the stored procedure initially with 1 month date range. set statistics time on go --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-31' go The stored procedure took 48 ms to complete. The stored procedure was granted 6656 KB based on 43199.9 rows being estimated. The estimated number of rows, 43199.9 is similar to actual number of rows 43200 and hence the memory estimation should be ok. There was no Sort Warnings in SQL Profiler. Now let’s execute the stored procedure with 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-06-30' go The stored procedure took 679 ms to complete. The stored procedure was granted 6656 KB based on 43199.9 rows being estimated. The estimated number of rows, 43199.9 is way different from the actual number of rows 259200 because the estimation is based on the first set of parameter value supplied to the stored procedure which is 1 month in our case. This underestimation will lead to sort spill over tempdb, resulting in poor performance. There was Sort Warnings in SQL Profiler. To monitor the amount of data written and read from tempdb, one can execute select num_of_bytes_written, num_of_bytes_read from sys.dm_io_virtual_file_stats(2, NULL) before and after the stored procedure execution, for additional information refer to the webcast: www.sqlworkshops.com/webcasts. Let’s recompile the stored procedure and then let’s first execute the stored procedure with 6 month date range. In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts for further details. exec sp_recompile CustomersByCreationDate go --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-06-30' go Now the stored procedure took only 294 ms instead of 679 ms. The stored procedure was granted 26832 KB of memory. The estimated number of rows, 259200 is similar to actual number of rows of 259200. Better performance of this stored procedure is due to better estimation of memory and avoiding sort spill over tempdb. There was no Sort Warnings in SQL Profiler. Now let’s execute the stored procedure with 1 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-31' go The stored procedure took 49 ms to complete, similar to our very first stored procedure execution. This stored procedure was granted more memory (26832 KB) than necessary memory (6656 KB) based on 6 months of data estimation (259200 rows) instead of 1 month of data estimation (43199.9 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is 6 months in this case. This overestimation did not affect performance, but it might affect performance of other concurrent queries requiring memory and hence overestimation is not recommended. This overestimation might affect performance Hash Match operations, refer to article Plan Caching and Query Memory Part II for further details. Let’s recompile the stored procedure and then let’s first execute the stored procedure with 2 day date range. exec sp_recompile CustomersByCreationDate go --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-02' go The stored procedure took 1 ms. The stored procedure was granted 1024 KB based on 1440 rows being estimated. There was no Sort Warnings in SQL Profiler. Now let’s execute the stored procedure with 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-06-30' go The stored procedure took 955 ms to complete, way higher than 679 ms or 294ms we noticed before. The stored procedure was granted 1024 KB based on 1440 rows being estimated. But we noticed in the past this stored procedure with 6 month date range needed 26832 KB of memory to execute optimally without spill over tempdb. This is clear underestimation of memory and the reason for the very poor performance. There was Sort Warnings in SQL Profiler. Unlike before this was a Multiple pass sort instead of Single pass sort. This occurs when granted memory is too low. Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined date range. Let’s recreate the stored procedure with recompile hint. --Example provided by www.sqlworkshops.com drop proc CustomersByCreationDate go create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as begin declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c where c.CreationDate between @CreationDateFrom and @CreationDateTo order by c.CustomerName option (maxdop 1, recompile) end go Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-30' exec CustomersByCreationDate '2001-01-01', '2001-06-30' go The stored procedure took 48ms and 291 ms in line with previous optimal execution times. The stored procedure with 1 month date range has good estimation like before. The stored procedure with 6 month date range also has good estimation and memory grant like before because the query was recompiled with current set of parameter values. The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit. Let’s recreate the stored procedure with optimize for hint of 6 month date range. --Example provided by www.sqlworkshops.com drop proc CustomersByCreationDate go create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as begin declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c where c.CreationDate between @CreationDateFrom and @CreationDateTo order by c.CustomerName option (maxdop 1, optimize for (@CreationDateFrom = '2001-01-01', @CreationDateTo ='2001-06-30')) end go Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-30' exec CustomersByCreationDate '2001-01-01', '2001-06-30' go The stored procedure took 48ms and 291 ms in line with previous optimal execution times. The stored procedure with 1 month date range has overestimation of rows and memory. This is because we provided hint to optimize for 6 months of data. The stored procedure with 6 month date range has good estimation and memory grant because we provided hint to optimize for 6 months of data. Let’s execute the stored procedure with 12 month date range using the currently cashed plan for 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-12-31' go The stored procedure took 1138 ms to complete. 2592000 rows were estimated based on optimize for hint value for 6 month date range. Actual number of rows is 524160 due to 12 month date range. The stored procedure was granted enough memory to sort 6 month date range and not 12 month date range, so there will be spill over tempdb. There was Sort Warnings in SQL Profiler. As we see above, optimize for hint cannot guarantee enough memory and optimal performance compared to recompile hint. This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance. Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case. I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts. Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here. Disclaimer and copyright information:This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. R Meyyappan [email protected] LinkedIn: http://at.linkedin.com/in/rmeyyappan

Read the article
PTLQueue : a scalable bounded-capacity MPMC queue

- by Dave

Title: Fast concurrent MPMC queue -- I've used the following concurrent queue algorithm enough that it warrants a blog entry. I'll sketch out the design of a fast and scalable multiple-producer multiple-consumer (MPSC) concurrent queue called PTLQueue. The queue has bounded capacity and is implemented via a circular array. Bounded capacity can be a useful property if there's a mismatch between producer rates and consumer rates where an unbounded queue might otherwise result in excessive memory consumption by virtue of the container nodes that -- in some queue implementations -- are used to hold values. A bounded-capacity queue can provide flow control between components. Beware, however, that bounded collections can also result in resource deadlock if abused. The put() and take() operators are partial and wait for the collection to become non-full or non-empty, respectively. Put() and take() do not allocate memory, and are not vulnerable to the ABA pathologies. The PTLQueue algorithm can be implemented equally well in C/C++ and Java. Partial operators are often more convenient than total methods. In many use cases if the preconditions aren't met, there's nothing else useful the thread can do, so it may as well wait via a partial method. An exception is in the case of work-stealing queues where a thief might scan a set of queues from which it could potentially steal. Total methods return ASAP with a success-failure indication. (It's tempting to describe a queue or API as blocking or non-blocking instead of partial or total, but non-blocking is already an overloaded concurrency term. Perhaps waiting/non-waiting or patient/impatient might be better terms). It's also trivial to construct partial operators by busy-waiting via total operators, but such constructs may be less efficient than an operator explicitly and intentionally designed to wait. A PTLQueue instance contains an array of slots, where each slot has volatile Turn and MailBox fields. The array has power-of-two length allowing mod/div operations to be replaced by masking. We assume sensible padding and alignment to reduce the impact of false sharing. (On x86 I recommend 128-byte alignment and padding because of the adjacent-sector prefetch facility). Each queue also has PutCursor and TakeCursor cursor variables, each of which should be sequestered as the sole occupant of a cache line or sector. You can opt to use 64-bit integers if concerned about wrap-around aliasing in the cursor variables. Put(null) is considered illegal, but the caller or implementation can easily check for and convert null to a distinguished non-null proxy value if null happens to be a value you'd like to pass. Take() will accordingly convert the proxy value back to null. An advantage of PTLQueue is that you can use atomic fetch-and-increment for the partial methods. We initialize each slot at index I with (Turn=I, MailBox=null). Both cursors are initially 0. All shared variables are considered "volatile" and atomics such as CAS and AtomicFetchAndIncrement are presumed to have bidirectional fence semantics. Finally T is the templated type. I've sketched out a total tryTake() method below that allows the caller to poll the queue. tryPut() has an analogous construction. Zebra stripping : alternating row colors for nice-looking code listings. See also google code "prettify" : https://code.google.com/p/google-code-prettify/ Prettify is a javascript module that yields the HTML/CSS/JS equivalent of pretty-print. -- pre:nth-child(odd) { background-color:#ff0000; } pre:nth-child(even) { background-color:#0000ff; } border-left: 11px solid #ccc; margin: 1.7em 0 1.7em 0.3em; background-color:#BFB; font-size:12px; line-height:65%; " // PTLQueue : Put(v) : // producer : partial method - waits as necessary assert v != null assert Mask = 1 && (Mask & (Mask+1)) == 0 // Document invariants // doorway step // Obtain a sequence number -- ticket // As a practical concern the ticket value is temporally unique // The ticket also identifies and selects a slot auto tkt = AtomicFetchIncrement (&PutCursor, 1) slot * s = &Slots[tkt & Mask] // waiting phase : // wait for slot's generation to match the tkt value assigned to this put() invocation. // The "generation" is implicitly encoded as the upper bits in the cursor // above those used to specify the index : tkt div (Mask+1) // The generation serves as an epoch number to identify a cohort of threads // accessing disjoint slots while s-Turn != tkt : Pause assert s-MailBox == null s-MailBox = v // deposit and pass message Take() : // consumer : partial method - waits as necessary auto tkt = AtomicFetchIncrement (&TakeCursor,1) slot * s = &Slots[tkt & Mask] // 2-stage waiting : // First wait for turn for our generation // Acquire exclusive "take" access to slot's MailBox field // Then wait for the slot to become occupied while s-Turn != tkt : Pause // Concurrency in this section of code is now reduced to just 1 producer thread // vs 1 consumer thread. // For a given queue and slot, there will be most one Take() operation running // in this section. // Consumer waits for producer to arrive and make slot non-empty // Extract message; clear mailbox; advance Turn indicator // We have an obvious happens-before relation : // Put(m) happens-before corresponding Take() that returns that same "m" for T v = s-MailBox if v != null : s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 // unlock slot to admit next producer and consumer return v Pause tryTake() : // total method - returns ASAP with failure indication for auto tkt = TakeCursor slot * s = &Slots[tkt & Mask] if s-Turn != tkt : return null T v = s-MailBox // presumptive return value if v == null : return null // ratify tkt and v values and commit by advancing cursor if CAS (&TakeCursor, tkt, tkt+1) != tkt : continue s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 return v The basic idea derives from the Partitioned Ticket Lock "PTL" (US20120240126-A1) and the MultiLane Concurrent Bag (US8689237). The latter is essentially a circular ring-buffer where the elements themselves are queues or concurrent collections. You can think of the PTLQueue as a partitioned ticket lock "PTL" augmented to pass values from lock to unlock via the slots. Alternatively, you could conceptualize of PTLQueue as a degenerate MultiLane bag where each slot or "lane" consists of a simple single-word MailBox instead of a general queue. Each lane in PTLQueue also has a private Turn field which acts like the Turn (Grant) variables found in PTL. Turn enforces strict FIFO ordering and restricts concurrency on the slot mailbox field to at most one simultaneous put() and take() operation. PTL uses a single "ticket" variable and per-slot Turn (grant) fields while MultiLane has distinct PutCursor and TakeCursor cursors and abstract per-slot sub-queues. Both PTL and MultiLane advance their cursor and ticket variables with atomic fetch-and-increment. PTLQueue borrows from both PTL and MultiLane and has distinct put and take cursors and per-slot Turn fields. Instead of a per-slot queues, PTLQueue uses a simple single-word MailBox field. PutCursor and TakeCursor act like a pair of ticket locks, conferring "put" and "take" access to a given slot. PutCursor, for instance, assigns an incoming put() request to a slot and serves as a PTL "Ticket" to acquire "put" permission to that slot's MailBox field. To better explain the operation of PTLQueue we deconstruct the operation of put() and take() as follows. Put() first increments PutCursor obtaining a new unique ticket. That ticket value also identifies a slot. Put() next waits for that slot's Turn field to match that ticket value. This is tantamount to using a PTL to acquire "put" permission on the slot's MailBox field. Finally, having obtained exclusive "put" permission on the slot, put() stores the message value into the slot's MailBox. Take() similarly advances TakeCursor, identifying a slot, and then acquires and secures "take" permission on a slot by waiting for Turn. Take() then waits for the slot's MailBox to become non-empty, extracts the message, and clears MailBox. Finally, take() advances the slot's Turn field, which releases both "put" and "take" access to the slot's MailBox. Note the asymmetry : put() acquires "put" access to the slot, but take() releases that lock. At any given time, for a given slot in a PTLQueue, at most one thread has "put" access and at most one thread has "take" access. This restricts concurrency from general MPMC to 1-vs-1. We have 2 ticket locks -- one for put() and one for take() -- each with its own "ticket" variable in the form of the corresponding cursor, but they share a single "Grant" egress variable in the form of the slot's Turn variable. Advancing the PutCursor, for instance, serves two purposes. First, we obtain a unique ticket which identifies a slot. Second, incrementing the cursor is the doorway protocol step to acquire the per-slot mutual exclusion "put" lock. The cursors and operations to increment those cursors serve double-duty : slot-selection and ticket assignment for locking the slot's MailBox field. At any given time a slot MailBox field can be in one of the following states: empty with no pending operations -- neutral state; empty with one or more waiting take() operations pending -- deficit; occupied with no pending operations; occupied with one or more waiting put() operations -- surplus; empty with a pending put() or pending put() and take() operations -- transitional; or occupied with a pending take() or pending put() and take() operations -- transitional. The partial put() and take() operators can be implemented with an atomic fetch-and-increment operation, which may confer a performance advantage over a CAS-based loop. In addition we have independent PutCursor and TakeCursor cursors. Critically, a put() operation modifies PutCursor but does not access the TakeCursor and a take() operation modifies the TakeCursor cursor but does not access the PutCursor. This acts to reduce coherence traffic relative to some other queue designs. It's worth noting that slow threads or obstruction in one slot (or "lane") does not impede or obstruct operations in other slots -- this gives us some degree of obstruction isolation. PTLQueue is not lock-free, however. The implementation above is expressed with polite busy-waiting (Pause) but it's trivial to implement per-slot parking and unparking to deschedule waiting threads. It's also easy to convert the queue to a more general deque by replacing the PutCursor and TakeCursor cursors with Left/Front and Right/Back cursors that can move either direction. Specifically, to push and pop from the "left" side of the deque we would decrement and increment the Left cursor, respectively, and to push and pop from the "right" side of the deque we would increment and decrement the Right cursor, respectively. We used a variation of PTLQueue for message passing in our recent OPODIS 2013 paper. ul { list-style:none; padding-left:0; padding:0; margin:0; margin-left:0; } ul#myTagID { padding: 0px; margin: 0px; list-style:none; margin-left:0;} -- -- There's quite a bit of related literature in this area. I'll call out a few relevant references: Wilson's NYU Courant Institute UltraComputer dissertation from 1988 is classic and the canonical starting point : Operating System Data Structures for Shared-Memory MIMD Machines with Fetch-and-Add. Regarding provenance and priority, I think PTLQueue or queues effectively equivalent to PTLQueue have been independently rediscovered a number of times. See CB-Queue and BNPBV, below, for instance. But Wilson's dissertation anticipates the basic idea and seems to predate all the others. Gottlieb et al : Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors Orozco et al : CB-Queue in Toward high-throughput algorithms on many-core architectures which appeared in TACO 2012. Meneghin et al : BNPVB family in Performance evaluation of inter-thread communication mechanisms on multicore/multithreaded architecture Dmitry Vyukov : bounded MPMC queue (highly recommended) Alex Otenko : US8607249 (highly related). John Mellor-Crummey : Concurrent queues: Practical fetch-and-phi algorithms. Technical Report 229, Department of Computer Science, University of Rochester Thomasson : FIFO Distributed Bakery Algorithm (very similar to PTLQueue). Scott and Scherer : Dual Data Structures I'll propose an optimization left as an exercise for the reader. Say we wanted to reduce memory usage by eliminating inter-slot padding. Such padding is usually "dark" memory and otherwise unused and wasted. But eliminating the padding leaves us at risk of increased false sharing. Furthermore lets say it was usually the case that the PutCursor and TakeCursor were numerically close to each other. (That's true in some use cases). We might still reduce false sharing by incrementing the cursors by some value other than 1 that is not trivially small and is coprime with the number of slots. Alternatively, we might increment the cursor by one and mask as usual, resulting in a logical index. We then use that logical index value to index into a permutation table, yielding an effective index for use in the slot array. The permutation table would be constructed so that nearby logical indices would map to more distant effective indices. (Open question: what should that permutation look like? Possibly some perversion of a Gray code or De Bruijn sequence might be suitable). As an aside, say we need to busy-wait for some condition as follows : "while C == 0 : Pause". Lets say that C is usually non-zero, so we typically don't wait. But when C happens to be 0 we'll have to spin for some period, possibly brief. We can arrange for the code to be more machine-friendly with respect to the branch predictors by transforming the loop into : "if C == 0 : for { Pause; if C != 0 : break; }". Critically, we want to restructure the loop so there's one branch that controls entry and another that controls loop exit. A concern is that your compiler or JIT might be clever enough to transform this back to "while C == 0 : Pause". You can sometimes avoid this by inserting a call to a some type of very cheap "opaque" method that the compiler can't elide or reorder. On Solaris, for instance, you could use :"if C == 0 : { gethrtime(); for { Pause; if C != 0 : break; }}". It's worth noting the obvious duality between locks and queues. If you have strict FIFO lock implementation with local spinning and succession by direct handoff such as MCS or CLH,then you can usually transform that lock into a queue. Hidden commentary and annotations - invisible : * And of course there's a well-known duality between queues and locks, but I'll leave that topic for another blog post. * Compare and contrast : PTLQ vs PTL and MultiLane * Equivalent : Turn; seq; sequence; pos; position; ticket * Put = Lock; Deposit Take = identify and reserve slot; wait; extract & clear; unlock * conceptualize : Distinct PutLock and TakeLock implemented as ticket lock or PTL Distinct arrival cursors but share per-slot "Turn" variable provides exclusive role-based access to slot's mailbox field put() acquires exclusive access to a slot for purposes of "deposit" assigns slot round-robin and then acquires deposit access rights/perms to that slot take() acquires exclusive access to slot for purposes of "withdrawal" assigns slot round-robin and then acquires withdrawal access rights/perms to that slot At any given time, only one thread can have withdrawal access to a slot at any given time, only one thread can have deposit access to a slot Permissible for T1 to have deposit access and T2 to simultaneously have withdrawal access * round-robin for the purposes of; role-based; access mode; access role mailslot; mailbox; allocate/assign/identify slot rights; permission; license; access permission; * PTL/Ticket hybrid Asymmetric usage ; owner oblivious lock-unlock pairing K-exclusion add Grant cursor pass message m from lock to unlock via Slots[] array Cursor performs 2 functions : + PTL ticket + Assigns request to slot in round-robin fashion Deconstruct protocol : explication put() : allocate slot in round-robin fashion acquire PTL for "put" access store message into slot associated with PTL index take() : Acquire PTL for "take" access // doorway step seq = fetchAdd (&Grant, 1) s = &Slots[seq & Mask] // waiting phase while s-Turn != seq : pause Extract : wait for s-mailbox to be full v = s-mailbox s-mailbox = null Release PTL for both "put" and "take" access s-Turn = seq + Mask + 1 * Slot round-robin assignment and lock "doorway" protocol leverage the same cursor and FetchAdd operation on that cursor FetchAdd (&Cursor,1) + round-robin slot assignment and dispersal + PTL/ticket lock "doorway" step waiting phase is via "Turn" field in slot * PTLQueue uses 2 cursors -- put and take. Acquire "put" access to slot via PTL-like lock Acquire "take" access to slot via PTL-like lock 2 locks : put and take -- at most one thread can access slot's mailbox Both locks use same "turn" field Like multilane : 2 cursors : put and take slot is simple 1-capacity mailbox instead of queue Borrow per-slot turn/grant from PTL Provides strict FIFO Lock slot : put-vs-put take-vs-take at most one put accesses slot at any one time at most one put accesses take at any one time reduction to 1-vs-1 instead of N-vs-M concurrency Per slot locks for put/take Release put/take by advancing turn * is instrumental in ... * P-V Semaphore vs lock vs K-exclusion * See also : FastQueues-excerpt.java dice-etc/queue-mpmc-bounded-blocking-circular-xadd/ * PTLQueue is the same as PTLQB - identical * Expedient return; ASAP; prompt; immediately * Lamport's Bakery algorithm : doorway step then waiting phase Threads arriving at doorway obtain a unique ticket number Threads enter in ticket order * In the terminology of Reed and Kanodia a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer It can also be thought of as an optimization of Lamport's bakery lock was designed for fault-tolerance rather than performance Instead of spinning on the release counter, processors using a bakery lock repeatedly examine the tickets of their peers --

Read the article
Error 2013: Lost connection to MySQL server during query when executing CHECK TABLE FOR UPGRADE

- by Dean Richardson

I just upgraded Ubuntu from 11.10 to 12.04. My rails app now returns the (passenger) error "Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) (Mysql2::Error)". I get a similar error when I try to access mysql at the command line on my Ubuntu server using mysql -u root -p. I have mysql-server 5.5 installed. I've checked and mysql is not running. When I try to restart it, it fails. Here are some key lines from the tail of /var/log/syslog after an attempted restart: dean@dgwjasonfried:/etc/mysql$ tail -f /var/log/syslog Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: Running 'mysqlcheck' with connection arguments: '--port=3306' '--socket=/var/run/mysqld/mysqld.sock' '--host=localhost' '--socket=/var/run/mysqld/mysqld.sock' '--host=localhost' '--socket=/var/run/mysqld/mysqld.sock' Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: Running 'mysqlcheck' with connection arguments: '--port=3306' '--socket=/var/run/mysqld/mysqld.sock' '--host=localhost' '--socket=/var/run/mysqld/mysqld.sock' '--host=localhost' '--socket=/var/run/mysqld/mysqld.sock' Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: /usr/bin/mysqlcheck: Got error: 2013: Lost connection to MySQL server during query when executing 'CHECK TABLE ... FOR UPGRADE' Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: FATAL ERROR: Upgrade failed Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: molex_app_development.assets OK Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: molex_app_development.ecd_types OK Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5124]: Checking for insecure root accounts. Mar 7 08:55:27 dgwjasonfried kernel: [ 7551.769657] init: mysql main process (5064) terminated with status 1 Mar 7 08:55:27 dgwjasonfried kernel: [ 7551.769697] init: mysql respawning too fast, stopped Here is most of /etc/mysql/my.cnf: Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock Here is entries for some specific programs The following values assume you have at least 32M ram This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] Basic Settings user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp lc-messages-dir = /usr/share/mysql skip-external-locking Instead of skip-networking the default is now to listen only on localhost which is more compatible and is not less secure. bind-address = 127.0.0.1 And here are permissions for var/run/mysqld/mysqld.sock: srwxrwxrwx 1 mysql mysql 0 Mar 7 09:18 mysqld.sock I'd be grateful for any suggestions the community might have. I reviewed the related questions here and attempted some of the fixes offered but to no avail. Thanks! Dean Richardson Update: Thanks to quanta's suggestion, I looked at the /var/log/mysql/error.log file. I found error messages relating to pointers, fatal signals, and more stuff that I really couldn't make much sense of. I also found mysql man page references, however. One suggested that I try starting mysqld with the --innodb_force_recovery=# option, then attempt to dump (or drop) the offending/corrupted database or table. I worked through the escalating option levels one-by-one (innodb_force_recovery=1, innodb_force_recovery=2, etc.) This allowed me to successfully run mysql -u root -p from the command line and execute several commands. I was able to run queries on my production database, but any attempt to query, dump, or even drop my development database raised an error and led to me losing the connection to mysql. So I've made progress, but until I'm somehow able to drop or repair my development db I'm still unable to get my app to load. Any further advice or suggestions? Thanks! Dean Update: Right after running sudo mysqld --innodb_force_recover=1 from the command line, the error.log contains this: Right after retrying sudo mysqld --innodb_force_recover=1, The error.log file shows this: 130308 4:55:39 [Note] Plugin 'FEDERATED' is disabled. 130308 4:55:39 InnoDB: The InnoDB memory heap is disabled 130308 4:55:39 InnoDB: Mutexes and rw_locks use GCC atomic builtins 130308 4:55:39 InnoDB: Compressed tables use zlib 1.2.3.4 130308 4:55:39 InnoDB: Initializing buffer pool, size = 128.0M 130308 4:55:39 InnoDB: Completed initialization of buffer pool 130308 4:55:39 InnoDB: highest supported file format is Barracuda. InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 130308 4:55:39 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 130308 4:55:40 InnoDB: Waiting for the background threads to start 130308 4:55:41 InnoDB: 1.1.8 started; log sequence number 10259220 130308 4:55:41 InnoDB: !!! innodb_force_recovery is set to 1 !!! 130308 4:55:41 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306 130308 4:55:41 [Note] - '127.0.0.1' resolves to '127.0.0.1'; 130308 4:55:41 [Note] Server socket created on IP: '127.0.0.1'. 130308 4:55:41 [Note] Event Scheduler: Loaded 0 events 130308 4:55:41 [Note] mysqld: ready for connections. Version: '5.5.29-0ubuntu0.12.04.2' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) Then after mysql -u root -p and mysql> drop database molex_app_development; ERROR 2013 (HY000): Lost connection to MySQL server during query mysql> the error.log contains: dean@dgwjasonfried:/var/log/mysql$ tail -f error.log /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6a3ff9ecbd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f6a1c004bd8): is an invalid pointer Connection ID (thread ID): 1 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 130308 4:55:39 [Note] Plugin 'FEDERATED' is disabled. 130308 4:55:39 InnoDB: The InnoDB memory heap is disabled 130308 4:55:39 InnoDB: Mutexes and rw_locks use GCC atomic builtins 130308 4:55:39 InnoDB: Compressed tables use zlib 1.2.3.4 130308 4:55:39 InnoDB: Initializing buffer pool, size = 128.0M 130308 4:55:39 InnoDB: Completed initialization of buffer pool 130308 4:55:39 InnoDB: highest supported file format is Barracuda. InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 130308 4:55:39 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 130308 4:55:40 InnoDB: Waiting for the background threads to start 130308 4:55:41 InnoDB: 1.1.8 started; log sequence number 10259220 130308 4:55:41 InnoDB: !!! innodb_force_recovery is set to 1 !!! 130308 4:55:41 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306 130308 4:55:41 [Note] - '127.0.0.1' resolves to '127.0.0.1'; 130308 4:55:41 [Note] Server socket created on IP: '127.0.0.1'. 130308 4:55:41 [Note] Event Scheduler: Loaded 0 events 130308 4:55:41 [Note] mysqld: ready for connections. Version: '5.5.29-0ubuntu0.12.04.2' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 130308 4:58:23 [ERROR] Incorrect definition of table mysql.proc: expected column 'comment' at position 15 to have type text, found type char(64). 130308 4:58:23 InnoDB: Assertion failure in thread 140168992810752 in file fsp0fsp.c line 3639 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 10:58:23 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=151 thread_count=1 connection_count=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 346681 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f7ba4f6c2f0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f7ba3065e60 thread_stack 0x30000 mysqld(my_print_stacktrace+0x29)[0x7f7ba3609039] mysqld(handle_fatal_signal+0x483)[0x7f7ba34cf9c3] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f7ba2220cb0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f7ba188c425] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f7ba188fb8b] mysqld(+0x65e0fc)[0x7f7ba37160fc] mysqld(+0x602be6)[0x7f7ba36babe6] mysqld(+0x635006)[0x7f7ba36ed006] mysqld(+0x5d7072)[0x7f7ba368f072] mysqld(+0x5d7b9c)[0x7f7ba368fb9c] mysqld(+0x6a3348)[0x7f7ba375b348] mysqld(+0x6a3887)[0x7f7ba375b887] mysqld(+0x5c6a86)[0x7f7ba367ea86] mysqld(+0x5ae3a7)[0x7f7ba36663a7] mysqld(_Z15ha_delete_tableP3THDP10handlertonPKcS4_S4_b+0x16d)[0x7f7ba34d3ffd] mysqld(_Z23mysql_rm_table_no_locksP3THDP10TABLE_LISTbbbb+0x568)[0x7f7ba3417f78] mysqld(_Z11mysql_rm_dbP3THDPcbb+0x8aa)[0x7f7ba339780a] mysqld(_Z21mysql_execute_commandP3THD+0x394c)[0x7f7ba33b886c] mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x10f)[0x7f7ba33bb28f] mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1380)[0x7f7ba33bc6e0] mysqld(_Z24do_handle_one_connectionP3THD+0x1bd)[0x7f7ba346119d] mysqld(handle_one_connection+0x50)[0x7f7ba3461200] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f7ba2218e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f7ba1949cbd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f7b7c004b60): is an invalid pointer Connection ID (thread ID): 1 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. --Dean

Read the article
Silverlight 4 + WCF RIA - Data Service Design Best Practices

- by Chadd Nervig

Hey all. I realize this is a rather long question, but I'd really appreciate any help from anyone experienced with RIA services. Thanks! I'm working on a Silverlight 4 app that views data from the server. I'm relatively inexperienced with RIA Services, so have been working through the tasks of getting the data I need down to the client, but every new piece I add to the puzzle seems to be more and more problematic. I feel like I'm missing some basic concepts here, and it seems like I'm just 'hacking' pieces on, in time-consuming ways, each one breaking the previous ones as I try to add them. I'd love to get the feedback of developers experienced with RIA services, to figure out the intended way to do what I'm trying to do. Let me lay out what I'm trying to do: First, the data. The source of this data is a variety of sources, primarily created by a shared library which reads data from our database, and exposes it as POCOs (Plain Old CLR Objects). I'm creating my own POCOs to represent the different types of data I need to pass between server and client. DataA - This app is for viewing a certain type of data, lets call DataA, in near-realtime. Every 3 minutes, the client should pull data down from the server, of all the new DataA since the last time it requested data. DataB - Users can view the DataA objects in the app, and may select one of them from the list, which displays additional details about that DataA. I'm bringing these extra details down from the server as DataB. DataC - One of the things that DataB contains is a history of a couple important values over time. I'm calling each data point of this history a DataC object, and each DataB object contains many DataCs. The Data Model - On the server side, I have a single DomainService: [EnableClientAccess] public class MyDomainService : DomainService { public IEnumerable<DataA> GetDataA(DateTime? startDate) { /*Pieces together the DataAs that have been created since startDate, and returns them*/ } public DataB GetDataB(int dataAID) { /*Looks up the extended info for that dataAID, constructs a new DataB with that DataA's data, plus the extended info (with multiple DataCs in a List<DataC> property on the DataB), and returns it*/ } //Not exactly sure why these are here, but I think it //wouldn't compile without them for some reason? The data //is entirely read-only, so I don't need to update. public void UpdateDataA(DataA dataA) { throw new NotSupportedException(); } public void UpdateDataB(DataB dataB) { throw new NotSupportedException(); } } The classes for DataA/B/C look like this: [KnownType(typeof(DataB))] public partial class DataA { [Key] [DataMember] public int DataAID { get; set; } [DataMember] public decimal MyDecimalA { get; set; } [DataMember] public string MyStringA { get; set; } [DataMember] public DataTime MyDateTimeA { get; set; } } public partial class DataB : DataA { [Key] [DataMember] public int DataAID { get; set; } [DataMember] public decimal MyDecimalB { get; set; } [DataMember] public string MyStringB { get; set; } [Include] //I don't know which of these, if any, I need? [Composition] [Association("DataAToC","DataAID","DataAID")] public List<DataC> DataCs { get; set; } } public partial class DataC { [Key] [DataMember] public int DataAID { get; set; } [Key] [DataMember] public DateTime Timestamp { get; set; } [DataMember] public decimal MyHistoricDecimal { get; set; } } I guess a big question I have here is... Should I be using Entities instead of POCOs? Are my classes constructed correctly to be able to pass the data down correctly? Should I be using Invoke methods instead of Query (Get) methods on the DomainService? On the client side, I'm having a number of issues. Surprisingly, one of my biggest ones has been threading. I didn't expect there to be so many threading issues with MyDomainContext. What I've learned is that you only seem to be able to create MyDomainContextObjects on the UI thread, all of the queries you can make are done asynchronously only, and that if you try to fake doing it synchronously by blocking the calling thread until the LoadOperation finishes, you have to do so on a background thread, since it uses the UI thread to make the query. So here's what I've got so far. The app should display a stream of the DataA objects, spreading each 3min chunk of them over the next 3min (so they end up displayed 3min after the occurred, looking like a continuous stream, but only have to be downloaded in 3min bursts). To do this, the main form initializes, creates a private MyDomainContext, and starts up a background worker, which continuously loops in a while(true). On each loop, it checks if it has any DataAs left over to display. If so, it displays that Data, and Thread.Sleep()s until the next DataA is scheduled to be displayed. If it's out of data, it queries for more, using the following methods: public DataA[] GetDataAs(DateTime? startDate) { _loadOperationGetDataACompletion = new AutoResetEvent(false); LoadOperation<DataA> loadOperationGetDataA = null; loadOperationGetDataA = _context.Load(_context.GetDataAQuery(startDate), System.ServiceModel.DomainServices.Client.LoadBehavior.RefreshCurrent, false); loadOperationGetDataA.Completed += new EventHandler(loadOperationGetDataA_Completed); _loadOperationGetDataACompletion.WaitOne(); List<DataA> dataAs = new List<DataA>(); foreach (var dataA in loadOperationGetDataA.Entities) dataAs.Add(dataA); return dataAs.ToArray(); } private static AutoResetEvent _loadOperationGetDataACompletion; private static void loadOperationGetDataA_Completed(object sender, EventArgs e) { _loadOperationGetDataACompletion.Set(); } Seems kind of clunky trying to force it into being synchronous, but since this already is on a background thread, I think this is OK? So far, everything actually works, as much of a hack as it seems like it may be. It's important to note that if I try to run that code on the UI thread, it locks, because it waits on the WaitOne() forever, locking the thread, so it can't make the Load request to the server. So once the data is displayed, users can click on one as it goes by to fill a details pane with the full DataB data about that object. To do that, I have the the details pane user control subscribing to a selection event I have setup, which gets fired when the selection changes (on the UI thread). I use a similar technique there, to get the DataB object: void SelectionService_SelectedDataAChanged(object sender, EventArgs e) { DataA dataA = /*Get the selected DataA*/; MyDomainContext context = new MyDomainContext(); var loadOperationGetDataB = context.Load(context.GetDataBQuery(dataA.DataAID), System.ServiceModel.DomainServices.Client.LoadBehavior.RefreshCurrent, false); loadOperationGetDataB.Completed += new EventHandler(loadOperationGetDataB_Completed); } private void loadOperationGetDataB_Completed(object sender, EventArgs e) { this.DataContext = ((LoadOperation<DataB>)sender).Entities.SingleOrDefault(); } Again, it seems kinda hacky, but it works... except on the DataB that it loads, the DataCs list is empty. I've tried all kinds of things there, and I don't see what I'm doing wrong to allow the DataCs to come down with the DataB. I'm about ready to make a 3rd query for the DataCs, but that's screaming even more hackiness to me. It really feels like I'm fighting against the grain here, like I'm doing this in an entirely unintended way. If anyone could offer any assistance, and point out what I'm doing wrong here, I'd very much appreciate it! Thanks!

Read the article
mysqld crashes on any statement

- by ??iu

I restarted my slave to change configuration settings to skip reverse hostname lookup on connecting and to enable the slow query log. I edited /etc/my.cnf making only these changes, then restarted mysqld with /etc/init.d/mysql restart All appeared to be well but when I connect to msyqld remotely or locally though it connects okay a slight problem is that mysqld crashes whenever you try to issue any kind of statement. The client looks like: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.1.31-1ubuntu2-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show tables; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1 Current database: mydb ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on 'xx.xx.xx.xx' (61) ERROR: Can't connect to the server ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on 'xx.xx.xx.xx' (61) ERROR: Can't connect to the server ERROR 2006 (HY000): MySQL server has gone away Bus error The mysqld error log looks like: 101210 16:35:51 InnoDB: Error: (1500) Couldn't read the MAX(job_id) autoinc value from the index (PRIMARY). 101210 16:35:51 InnoDB: Assertion failure in thread 140245598570832 in file handler/ha_innodb.cc line 2595 InnoDB: Failing assertion: error == DB_SUCCESS InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 101210 16:35:51 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=3 max_threads=600 threads_connected=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x18209220 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f8d791580d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f902a76a080] /lib/libc.so.6(gsignal+0x35) [0x7f90291f8fb5] /lib/libc.so.6(abort+0x183) [0x7f90291fabc3] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x41b) [0x781f4b] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f902a7623ba] /lib/libc.so.6(clone+0x6d) [0x7f90292abfcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x18213c70 = thd->thread_id=3 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 101210 16:35:51 mysqld_safe Number of processes running now: 0 101210 16:35:51 mysqld_safe mysqld restarted InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 101210 16:35:54 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 101210 16:35:56 InnoDB: Started; log sequence number 456 143528628 101210 16:35:56 [Warning] 'user' entry 'root@PSDB102' ignored in --skip-name-resolve mode. 101210 16:35:56 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem. 101210 16:35:56 [Note] Event Scheduler: Loaded 0 events 101210 16:35:56 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.31-1ubuntu2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 101210 16:36:11 InnoDB: Error: (1500) Couldn't read the MAX(job_id) autoinc value from the index (PRIMARY). 101210 16:36:11 InnoDB: Assertion failure in thread 139955151501648 in file handler/ha_innodb.cc line 2595 InnoDB: Failing assertion: error == DB_SUCCESS InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 101210 16:36:11 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=600 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x18588720 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f49d916f0d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f4c8a73f080] /lib/libc.so.6(gsignal+0x35) [0x7f4c891cdfb5] /lib/libc.so.6(abort+0x183) [0x7f4c891cfbc3] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x41b) [0x781f4b] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f4c8a7373ba] /lib/libc.so.6(clone+0x6d) [0x7f4c89280fcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x18599950 = thd->thread_id=1 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 101210 16:36:11 mysqld_safe Number of processes running now: 0 101210 16:36:11 mysqld_safe mysqld restarted The config is [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] innodb_file_per_table innodb_buffer_pool_size=10G innodb_log_buffer_size=4M innodb_flush_log_at_trx_commit=2 innodb_thread_concurrency=8 skip-slave-start server-id=3 # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /DB2/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 127.0.0.1 # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 128K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP max_connections = 600 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 32M # skip-federated slow-query-log skip-name-resolve Update: I followed the instructions as per http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html and set innodb_force_recovery = 4 and the logs are showing a different error but the behavior is still the same: 101210 19:14:15 mysqld_safe mysqld restarted 101210 19:14:19 InnoDB: Started; log sequence number 456 143528628 InnoDB: !!! innodb_force_recovery is set to 4 !!! 101210 19:14:19 [Warning] 'user' entry 'root@PSDB102' ignored in --skip-name-resolve mode. 101210 19:14:19 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem. 101210 19:14:19 [Note] Event Scheduler: Loaded 0 events 101210 19:14:19 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.31-1ubuntu2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 101210 19:14:32 InnoDB: error: space object of table mydb/__twitter_friend, InnoDB: space id 1602 did not exist in memory. Retrying an open. 101210 19:14:32 InnoDB: error: space object of table mydb/access_request, InnoDB: space id 1318 did not exist in memory. Retrying an open. 101210 19:14:32 InnoDB: error: space object of table mydb/activity, InnoDB: space id 1595 did not exist in memory. Retrying an open. 101210 19:14:32 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=600 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x1753c070 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f7a0b5800d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f7cbc350080] /usr/sbin/mysqld(ha_innobase::innobase_get_index(unsigned int)+0x46) [0x77c516] /usr/sbin/mysqld(ha_innobase::innobase_initialize_autoinc()+0x40) [0x77c640] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x3f3) [0x781f23] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f7cbc3483ba] /lib/libc.so.6(clone+0x6d) [0x7f7cbae91fcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x1754d690 = thd->thread_id=1 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.

Read the article
webserver horrible slow, sometimes incredible fast

- by dhanke

i am running a small community ( 6000+ Members ) on a non-virtual 64-bit ubuntu 11.04 system. I am not a Linux-pro, not even advanced, i just tried to setup a webserver, which does nothing special actually. Delivering some dynamic PHP and RoR websites is its task. So it might be that my configuration files do look horrible bad. Also, i might use the wrong vocabulary, so in doubt, please ask. Having a current all-time record of 520 registered users (board-accounts, no system-users) online at same time, average server-load is about 2.0 - 5.0. Meantime (~250 users) average server load value is at about 0.4 - 0.8, sometimes, on some expensive searches a bit higher. everything fine. From time to time however, the load increases up to 120 (120.0, not 12.0 ;) ). In this time, its hard to even connect via SSH, but when i reach the server, and use top/htop/iotop to see whats happening, i cannot identify any process causing high CPU load. iotop tells me about a current reading/writing speed of about approx. 70kb/s, which is quite equal to power-off i think. Memory-Usage is max. at ~ 12GB of 16GB, so swap remains empty. now the odd (at least for me:) waiting some minutes ( since i always get a bit into a panic when this happens, it feels like 5 minutes, but i suppose its more like 20-30 minutes) and the server is back to normal. everything continues as normal. another odd fact: when i run hdparm -tT /dev/sda, i get answer like: /dev/sda: Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec when i run the same command while the server is "frozen", the answer is like /dev/sda: <- takes about 5 minutes until this line appears Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec <- 5 more minutes Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec <- another 5 minutes so the values are the same, but the quoted time is completely wrong. using time command as prefix also tells me that ~ 15 minutes were used. I searched in dmesg, /var/log/[messages|syslog] - nothing found. /var/log/errors however tells me that: Jul 4 20:28:30 localhost kernel: [19080.671415] INFO: task php5-fpm:27728 blocked for more than 120 seconds. Jul 4 20:28:30 localhost kernel: [19080.671419] "echo 0 /proc/sys/kernel/hung_task_timeout_secs" disables this message. multiple times. now that message does tell me that php5-fpm task was blocked or did block ? - but not if that is the cause or just one of the results of that "freeze". Anyone? to cut the long story short, i dont know where even to start analyzing. So if you can give me any advice by looking at following specs and configs, or ask me to provide more information, i`d be glad. Specs: 6 Core AMD Phenom(tm) II X6 1055T Processor * 16 Gigabyte Ram 2x 1.5 TB Seagate ST1500DL003-9VT16L via SATA 3 via SoftwareRaid (i suppose) Services: (due to service --status-all, those with [ + ]) nginx Webserver 1.0.14 mySQL 5.1.63 Server Ruby on Rails 2.3.11 ( passenger-nginx-module ) php5-fpm 5.3.6-13ubuntu3.7 SSH ido2db Further services: default crontab + nightly backup. syslog-ng Website consists of 2 subdomains, forum. and www. where forum is a phpBB3.x PHP-Board, and www a Ruby on Rails 2.3.11 application (portal). Mini-Note: sometimes i notice that the forum is pretty slow, in contrast to the always-fast (except for this "freeze") portal. Both share the same Database, but the portal is using it read-only. The Webserver is nginx, using phusion passenger module to communicate with the ruby-application. Also, for the forum it communicates with php5-fpm via socket: relevant nginx configuration parts ( with comments/questions starting by ; ) ; in case of freeze due to too high Filesystem activity, maybe adding a limit? #worker_rlimit_nofile 50000; user www-data; ; 6 cores, so i read 6 fits. maybe already wrong? worker_processes 6; pid /var/run/nginx.pid; events { worker_connections 1024; } http { passenger_root /var/lib/gems/1.8/gems/passenger-3.0.11; passenger_ruby /usr/bin/ruby1.8; ; the forum once featured a chat, which was working w/o websockets. ; so it was a hell of pull requests (deactivated now, freeze still happening) keepalive_timeout 65; keepalive_requests 50; gzip on; server { listen 80; server_name www.domain.tld; root /var/www/domain/rails/public; passenger_enabled on; } server { listen 80; server_name forum.domain.tld; location / { root /var/www/domain/forum; index index.php; } ; satic stuff to be handled by nginx location ~* ^/style/.+.(jpg|jpeg|gif|css|png|js|ico|xml)$ { access_log off; expires 30d; root /var/www/domain/forum/; } ; now the php magic, note the "backend"-fcgi_pass location ~ .php$ { fastcgi_split_path_info ^(.+\.php)(.*)$; fastcgi_pass backend; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME /var/www/domain/forum$fastcgi_script_name; include fastcgi_params; fastcgi_param QUERY_STRING $query_string; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_intercept_errors on; fastcgi_ignore_client_abort off; fastcgi_connect_timeout 60; fastcgi_send_timeout 180; fastcgi_read_timeout 180; fastcgi_buffer_size 128k; fastcgi_buffers 256 16k; fastcgi_busy_buffers_size 256k; fastcgi_temp_file_write_size 256k; fastcgi_max_temp_file_size 0; } location ~ /\.ht { deny all; } } ;the php5-fpm socket. i read that /dev/shm/ whould be the fastes place for this. bad idea in general? upstream backend { server unix:/dev/shm/phpfpm; } ... } php5-fpm settings (i changed this values due to php5-fpm error log messages higher and higher.. (freeze-problem was there before as well)* listen = /dev/shm/phpfpm user = www-data group = www-data pm = dynamic ; holy, 4000! well, shinking this value to earth-level gave me ; 100s of 502 bad gateway commands. this values were quite stable. ; since there are only max 520 users online i dont get it, why i would need ; as many children as configured here. due to keep-alive maybe? ; asking questions is easier for me since restarting server will make ; my community-members angry ;) pm.max_children = 4000 pm.start_servers = 100 pm.min_spare_servers = 50 pm.max_spare_servers = 150 pm.max_requests = 10 pm.status_path = /status ping.path = /ping ping.response = pong slowlog = log/$pool.log.slow ;should i use rlimit? ;rlimit_files = 1024 chdir = / mysql/my.cnf [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking bind-address = 127.0.0.1 key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 myisam-recover = BACKUP ; high number, but less gives some phpBB errors. max_connections = 450 table_cache = 512 ; i read twice the cpu cores, bad? thread_concurrency = 12 join_buffer_size = 2084K concurrent_insert = 3 query_cache_limit = 64M query_cache_size = 512M query_cache_type = 1 log_error = /var/log/mysql/error.log log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 2 expire_logs_days = 10 max_binlog_size = 100M low_priority_updates=1 [mysqldump] quick quote-names max_allowed_packet = 16M [isamchk] key_buffer = 16M !includedir /etc/mysql/conf.d/ I used smartctl already, hdds seem to be fine. /proc/mdstatus quotes: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md3 : active raid1 sda3[1] 1459264192 blocks [2/1] [_U] md1 : active raid1 sda1[0] 3911680 blocks [2/1] [U_] unused devices: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127727 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 127727 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited I quote some questions in my configuration files, these are not (intentional) directly problem-related, but would be nice for me to know wether they are indeed questionable or done right. One additional Fact: my MYSQL-database is at 12GB size. i dont know if that does matter, but mytop sometimes shows me 4-5 seconds long insert queries, some are 20-30 seconds long. Its just a feeling that i am unable to prove (because i dont know how), but when i disable the database, the freeze seems not to happen. Example: i created a dummy rails application to see the development log. the app made some sql-queries, reads and inserts. the log quite often was like: DbTest Load (0.3ms) SELECT * FROM `db_test` WHERE (`db_test`.`id` = 31722) LIMIT 1 SQL (0.1ms) BEGIN DbTest Update (0.3ms) UPDATE `db_test` SET `updated_at` = '2012-07-04 23:32:34' WHERE `id` = 31722 - now the log stands still for 5-60 seconds. SQL (49.1ms) COMMIT - SQL-Update time in the log does not include freeze time Rendering test/index Completed in 96ms (View: 16, DB: 59) | 200 OK [http://localhost:9000/test] Bad part is: this mini-freeze here only happens from time to time as well. note: meanwhile i cannot even upload files via scp. I currently feel like running form bad to worse and back by googling for my server-problem due to immense lack of knowledge regarding server configurations. It still makes me wonder, why those problems even appear, since 250 users a time is not such a high amount, right? So my questions: whats wrong and how to fix? ;) or: what information can i provide to make the situation more clear? can you point at some critical bad configuration-line which i should consider to catch up in the documentation? are there any tools i can run to see some possible bottlenecks? any further advice? (next to: "pay someone who knows what he does" - its a private project, server costs enough already. :)) Thanks for your time and help. Best Regards, Daniel P.S.: i renamed the configfiles to domain.tld since i dont want to have any % more load to the server until its fixed. might be a exaggeratedly thought.. P.P.S: if i asked a complete duplicate question, sorry. my search results seemed to be quite specific in their own way.

Read the article
Fedora 16 can connect to samba share using smbclient but not in nautilus 3.2.1

- by Nathan Jones

I have a machine running Ubuntu 11.10 Server acting as a Samba server to share my home directory. Everything works fine on my Windows 7 machine, but on my Fedora 16 laptop, if I use Nautilus to try to access the share using smb://192.168.0.8/nathan in the location bar, it just has the loading cursor and does nothing. It never shows any errors, nothing. Using smbclient works just fine, but I'd like to get it working in Nautilus. I know that there can be problems with SELinux and Samba, so I created a file called booleans.local that contains samba_enable_home_dirs=1. My smb.conf file looks like this: # For Unix password sync to work on a Debian GNU/Linux system, the following # parameters must be set (thanks to Ian Kahan <<[email protected]> for # sending the correct chat script for the passwd program in Debian Sarge). passwd program = /usr/bin/passwd %u passwd chat = *Enter\snew\s*\spassword:* %n\n *Retype\snew\s*\spassword:* %n\n *password\supdated\ssuccessfully* . # This boolean controls whether PAM will be used for password changes # when requested by an SMB client instead of the program listed in # 'passwd program'. The default is 'no'. pam password change = yes # This option controls how unsuccessful authentication attempts are mapped # to anonymous connections map to guest = bad user ########## Domains ########### # Is this machine able to authenticate users. Both PDC and BDC # must have this setting enabled. If you are the BDC you must # change the 'domain master' setting to no # ; domain logons = yes # # The following setting only takes effect if 'domain logons' is set # It specifies the location of the user's profile directory # from the client point of view) # The following required a [profiles] share to be setup on the # samba server (see below) ; logon path = \\%N\profiles\%U # Another common choice is storing the profile in the user's home directory # (this is Samba's default) # logon path = \\%N\%U\profile # The following setting only takes effect if 'domain logons' is set # It specifies the location of a user's home directory (from the client # point of view) ; logon drive = H: # logon home = \\%N\%U # The following setting only takes effect if 'domain logons' is set # It specifies the script to run during logon. The script must be stored # in the [netlogon] share # NOTE: Must be store in 'DOS' file format convention ; logon script = logon.cmd # This allows Unix users to be created on the domain controller via the SAMR # RPC pipe. The example command creates a user account with a disabled Unix # password; please adapt to your needs ; add user script = /usr/sbin/adduser --quiet --disabled-password --gecos "" %u # This allows machine accounts to be created on the domain controller via the # SAMR RPC pipe. # The following assumes a "machines" group exists on the system ; add machine script = /usr/sbin/useradd -g machines -c "%u machine account" -d /var/lib/samba -s /bin/false %u # This allows Unix groups to be created on the domain controller via the SAMR # RPC pipe. ; add group script = /usr/sbin/addgroup --force-badname %g ########## Printing ########## # If you want to automatically load your printer list rather # than setting them up individually then you'll need this # load printers = yes # lpr(ng) printing. You may wish to override the location of the # printcap file ; printing = bsd ; printcap name = /etc/printcap # CUPS printing. See also the cupsaddsmb(8) manpage in the # cupsys-client package. ; printing = cups ; printcap name = cups ############ Misc ############ # Using the following line enables you to customise your configuration # on a per machine basis. The %m gets replaced with the netbios name # of the machine that is connecting ; include = /home/samba/etc/smb.conf.%m # Most people will find that this option gives better performance. # See smb.conf(5) and /usr/share/doc/samba-doc/htmldocs/Samba3-HOWTO/speed.html # for details # You may want to add the following on a Linux system: # SO_RCVBUF=8192 SO_SNDBUF=8192 # socket options = TCP_NODELAY # The following parameter is useful only if you have the linpopup package # installed. The samba maintainer and the linpopup maintainer are # working to ease installation and configuration of linpopup and samba. ; message command = /bin/sh -c '/usr/bin/linpopup "%f" "%m" %s; rm %s' & # Domain Master specifies Samba to be the Domain Master Browser. If this # machine will be configured as a BDC (a secondary logon server), you # must set this to 'no'; otherwise, the default behavior is recommended. # domain master = auto # Some defaults for winbind (make sure you're not using the ranges # for something else.) ; idmap uid = 10000-20000 ; idmap gid = 10000-20000 ; template shell = /bin/bash # The following was the default behaviour in sarge, # but samba upstream reverted the default because it might induce # performance issues in large organizations. # See Debian bug #368251 for some of the consequences of *not* # having this setting and smb.conf(5) for details. ; winbind enum groups = yes ; winbind enum users = yes # Setup usershare options to enable non-root users to share folders # with the net usershare command. # Maximum number of usershare. 0 (default) means that usershare is disabled. ; usershare max shares = 100 # Allow users who've been granted usershare privileges to create # public shares, not just authenticated ones usershare allow guests = yes #======================= Share Definitions ======================= # Un-comment the following (and tweak the other settings below to suit) # to enable the default home directory shares. This will share each # user's home director as \\server\username [homes] comment = Home Directories browseable = yes # By default, the home directories are exported read-only. Change the # next parameter to 'no' if you want to be able to write to them. read only = no # File creation mask is set to 0700 for security reasons. If you want to # create files with group=rw permissions, set next parameter to 0775. ; create mask = 0775 # Directory creation mask is set to 0700 for security reasons. If you want to # create dirs. with group=rw permissions, set next parameter to 0775. ; directory mask = 0775 # By default, \\server\username shares can be connected to by anyone # with access to the samba server. Un-comment the following parameter # to make sure that only "username" can connect to \\server\username # The following parameter makes sure that only "username" can connect # # This might need tweaking when using external authentication schemes valid users = %S # Un-comment the following and create the netlogon directory for Domain Logons # (you need to configure Samba to act as a domain controller too.) ;[netlogon] ; comment = Network Logon Service ; path = /home/samba/netlogon ; guest ok = yes ; read only = yes # Un-comment the following and create the profiles directory to store # users profiles (see the "logon path" option above) # (you need to configure Samba to act as a domain controller too.) # The path below should be writable by all users so that their # profile directory may be created the first time they log on ;[profiles] ; comment = Users profiles ; path = /home/samba/profiles ; guest ok = no ; browseable = no ; create mask = 0600 ; directory mask = 0700 [printers] comment = All Printers browseable = no path = /var/spool/samba printable = yes guest ok = no read only = no create mask = 0700 # Windows clients look for this share name as a source of downloadable # printer drivers [print$] comment = Printer Drivers path = /var/lib/samba/printers browseable = yes read only = yes guest ok = no # Uncomment to allow remote administration of Windows print drivers. # You may need to replace 'lpadmin' with the name of the group your # admin users are members of. # Please note that you also need to set appropriate Unix permissions # to the drivers directory for these users to have write rights in it ; write list = root, @lpadmin # A sample share for sharing your CD-ROM with others. ;[cdrom] ; comment = Samba server's CD-ROM ; read only = yes ; locking = no ; path = /cdrom ; guest ok = yes # The next two parameters show how to auto-mount a CD-ROM when the # cdrom share is accesed. For this to work /etc/fstab must contain # an entry like this: # # /dev/scd0 /cdrom iso9660 defaults,noauto,ro,user 0 0 # # The CD-ROM gets unmounted automatically after the connection to the # # If you don't want to use auto-mounting/unmounting make sure the CD # is mounted on /cdrom # ; preexec = /bin/mount /cdrom ; postexec = /bin/umount /cdrom smbusers: <nathan> = <"nathan"> Any help would be very much appreciated! Thanks!

Read the article
mysqld crashes on any statement

- by ??iu

I restarted my slave to change configuration settings to skip reverse hostname lookup on connecting and to enable the slow query log. I edited /etc/my.cnf making only these changes, then restarted mysqld with /etc/init.d/mysql restart All appeared to be well but when I connect to msyqld remotely or locally though it connects okay a slight problem is that mysqld crashes whenever you try to issue any kind of statement. The client looks like: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.1.31-1ubuntu2-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show tables; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1 Current database: mydb ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on 'xx.xx.xx.xx' (61) ERROR: Can't connect to the server ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on 'xx.xx.xx.xx' (61) ERROR: Can't connect to the server ERROR 2006 (HY000): MySQL server has gone away Bus error The mysqld error log looks like: 101210 16:35:51 InnoDB: Error: (1500) Couldn't read the MAX(job_id) autoinc value from the index (PRIMARY). 101210 16:35:51 InnoDB: Assertion failure in thread 140245598570832 in file handler/ha_innodb.cc line 2595 InnoDB: Failing assertion: error == DB_SUCCESS InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 101210 16:35:51 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=3 max_threads=600 threads_connected=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x18209220 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f8d791580d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f902a76a080] /lib/libc.so.6(gsignal+0x35) [0x7f90291f8fb5] /lib/libc.so.6(abort+0x183) [0x7f90291fabc3] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x41b) [0x781f4b] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f902a7623ba] /lib/libc.so.6(clone+0x6d) [0x7f90292abfcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x18213c70 = thd->thread_id=3 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 101210 16:35:51 mysqld_safe Number of processes running now: 0 101210 16:35:51 mysqld_safe mysqld restarted InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 101210 16:35:54 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 101210 16:35:56 InnoDB: Started; log sequence number 456 143528628 101210 16:35:56 [Warning] 'user' entry 'root@PSDB102' ignored in --skip-name-resolve mode. 101210 16:35:56 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem. 101210 16:35:56 [Note] Event Scheduler: Loaded 0 events 101210 16:35:56 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.31-1ubuntu2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 101210 16:36:11 InnoDB: Error: (1500) Couldn't read the MAX(job_id) autoinc value from the index (PRIMARY). 101210 16:36:11 InnoDB: Assertion failure in thread 139955151501648 in file handler/ha_innodb.cc line 2595 InnoDB: Failing assertion: error == DB_SUCCESS InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 101210 16:36:11 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=600 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x18588720 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f49d916f0d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f4c8a73f080] /lib/libc.so.6(gsignal+0x35) [0x7f4c891cdfb5] /lib/libc.so.6(abort+0x183) [0x7f4c891cfbc3] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x41b) [0x781f4b] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f4c8a7373ba] /lib/libc.so.6(clone+0x6d) [0x7f4c89280fcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x18599950 = thd->thread_id=1 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 101210 16:36:11 mysqld_safe Number of processes running now: 0 101210 16:36:11 mysqld_safe mysqld restarted The config is [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] innodb_file_per_table innodb_buffer_pool_size=10G innodb_log_buffer_size=4M innodb_flush_log_at_trx_commit=2 innodb_thread_concurrency=8 skip-slave-start server-id=3 # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /DB2/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 127.0.0.1 # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 128K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP max_connections = 600 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 32M # skip-federated slow-query-log skip-name-resolve Update: I followed the instructions as per http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html and set innodb_force_recovery = 4 and the logs are showing a different error but the behavior is still the same: 101210 19:14:15 mysqld_safe mysqld restarted 101210 19:14:19 InnoDB: Started; log sequence number 456 143528628 InnoDB: !!! innodb_force_recovery is set to 4 !!! 101210 19:14:19 [Warning] 'user' entry 'root@PSDB102' ignored in --skip-name-resolve mode. 101210 19:14:19 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem. 101210 19:14:19 [Note] Event Scheduler: Loaded 0 events 101210 19:14:19 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.31-1ubuntu2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 101210 19:14:32 InnoDB: error: space object of table mydb/__twitter_friend, InnoDB: space id 1602 did not exist in memory. Retrying an open. 101210 19:14:32 InnoDB: error: space object of table mydb/access_request, InnoDB: space id 1318 did not exist in memory. Retrying an open. 101210 19:14:32 InnoDB: error: space object of table mydb/activity, InnoDB: space id 1595 did not exist in memory. Retrying an open. 101210 19:14:32 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=600 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x1753c070 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f7a0b5800d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f7cbc350080] /usr/sbin/mysqld(ha_innobase::innobase_get_index(unsigned int)+0x46) [0x77c516] /usr/sbin/mysqld(ha_innobase::innobase_initialize_autoinc()+0x40) [0x77c640] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x3f3) [0x781f23] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f7cbc3483ba] /lib/libc.so.6(clone+0x6d) [0x7f7cbae91fcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x1754d690 = thd->thread_id=1 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.

Read the article
XNA Xbox 360 Content Manager Thread freezing Draw Thread

- by Alikar

I currently have a game that takes in large images, easily bigger than 1MB, to serve as backgrounds. I know exactly when this transition is supposed to take place, so I made a loader class to handle loading these large images in the background, but when I load the images it still freezes the main thread where the drawing takes place. Since this code runs on the 360 I move the thread to the 4th hardware thread, but that doesn't seem to help. Below is the class I am using. Any thoughts as to why my new content manager which should be in its own thread is interrupting the draw in my main thread would be appreciated. namespace FileSystem { /// <summary> /// This is used to reference how many objects reference this texture. /// Everytime someone references a texture we increase the iNumberOfReferences. /// When a class calls remove on a specific texture we check to see if anything /// else is referencing the class, if it is we don't remove it. If there isn't /// anything referencing the texture its safe to dispose of. /// </summary> class TextureContainer { public uint uiNumberOfReferences = 0; public Texture2D texture; } /// <summary> /// This class loads all the files from the Content. /// </summary> static class FileManager { static Microsoft.Xna.Framework.Content.ContentManager Content; static EventWaitHandle wh = new AutoResetEvent(false); static Dictionary<string, TextureContainer> Texture2DResourceDictionary; static List<Texture2D> TexturesToDispose; static List<String> TexturesToLoad; static int iProcessor = 4; private static object threadMutex = new object(); private static object Texture2DMutex = new object(); private static object loadingMutex = new object(); private static bool bLoadingTextures = false; /// <summary> /// Returns if we are loading textures or not. /// </summary> public static bool LoadingTexture { get { lock (loadingMutex) { return bLoadingTextures; } } } /// <summary> /// Since this is an static class. This is the constructor for the file loadeder. This is the version /// for the Xbox 360. /// </summary> /// <param name="_Content"></param> public static void Initalize(IServiceProvider serviceProvider, string rootDirectory, int _iProcessor ) { Content = new Microsoft.Xna.Framework.Content.ContentManager(serviceProvider, rootDirectory); Texture2DResourceDictionary = new Dictionary<string, TextureContainer>(); TexturesToDispose = new List<Texture2D>(); iProcessor = _iProcessor; CreateThread(); } /// <summary> /// Since this is an static class. This is the constructor for the file loadeder. /// </summary> /// <param name="_Content"></param> public static void Initalize(IServiceProvider serviceProvider, string rootDirectory) { Content = new Microsoft.Xna.Framework.Content.ContentManager(serviceProvider, rootDirectory); Texture2DResourceDictionary = new Dictionary<string, TextureContainer>(); TexturesToDispose = new List<Texture2D>(); CreateThread(); } /// <summary> /// Creates the thread incase we wanted to set up some parameters /// Outside of the constructor. /// </summary> static public void CreateThread() { Thread t = new Thread(new ThreadStart(StartThread)); t.Start(); } // This is the function that we thread. static public void StartThread() { //BBSThreadClass BBSTC = (BBSThreadClass)_oData; FileManager.Execute(); } /// <summary> /// This thread shouldn't be called by the outside world. /// It allows the File Manager to loop. /// </summary> static private void Execute() { // Make sure our thread is on the correct processor on the XBox 360. #if WINDOWS #else Thread.CurrentThread.SetProcessorAffinity(new int[] { iProcessor }); Thread.CurrentThread.IsBackground = true; #endif // This loop will load textures into ram for us away from the main thread. while (true) { wh.WaitOne(); // Locking down our data while we process it. lock (threadMutex) { lock (loadingMutex) { bLoadingTextures = true; } bool bContainsKey = false; for (int con = 0; con < TexturesToLoad.Count; con++) { // If we have already loaded the texture into memory reference // the one in the dictionary. lock (Texture2DMutex) { bContainsKey = Texture2DResourceDictionary.ContainsKey(TexturesToLoad[con]); } if (bContainsKey) { // Do nothing } // Otherwise load it into the dictionary and then reference the // copy in the dictionary else { TextureContainer TC = new TextureContainer(); TC.uiNumberOfReferences = 1; // We start out with 1 referece. // Loading the texture into memory. try { TC.texture = Content.Load<Texture2D>(TexturesToLoad[con]); // This is passed into the dictionary, thus there is only one copy of // the texture in memory. // There is an issue with Sprite Batch and disposing textures. // This will have to wait until its figured out. lock (Texture2DMutex) { bContainsKey = Texture2DResourceDictionary.ContainsKey(TexturesToLoad[con]); Texture2DResourceDictionary.Add(TexturesToLoad[con], TC); } // We don't have the find the reference to the container since we // already have it. } // Occasionally our texture will already by loaded by another thread while // this thread is operating. This mainly happens on the first level. catch (Exception e) { // If this happens we don't worry about it since this thread only loads // texture data and if its already there we don't need to load it. } } Thread.Sleep(100); } } lock (loadingMutex) { bLoadingTextures = false; } } } static public void LoadTextureList(List<string> _textureList) { // Ensuring that we can't creating threading problems. lock (threadMutex) { TexturesToLoad = _textureList; } wh.Set(); } /// <summary> /// This loads a 2D texture which represents a 2D grid of Texels. /// </summary> /// <param name="_textureName">The name of the picture you wish to load.</param> /// <returns>Holds the image data.</returns> public static Texture2D LoadTexture2D( string _textureName ) { TextureContainer temp; lock (Texture2DMutex) { bool bContainsKey = false; // If we have already loaded the texture into memory reference // the one in the dictionary. lock (Texture2DMutex) { bContainsKey = Texture2DResourceDictionary.ContainsKey(_textureName); if (bContainsKey) { temp = Texture2DResourceDictionary[_textureName]; temp.uiNumberOfReferences++; // Incrementing the number of references } // Otherwise load it into the dictionary and then reference the // copy in the dictionary else { TextureContainer TC = new TextureContainer(); TC.uiNumberOfReferences = 1; // We start out with 1 referece. // Loading the texture into memory. try { TC.texture = Content.Load<Texture2D>(_textureName); // This is passed into the dictionary, thus there is only one copy of // the texture in memory. } // Occasionally our texture will already by loaded by another thread while // this thread is operating. This mainly happens on the first level. catch(Exception e) { temp = Texture2DResourceDictionary[_textureName]; temp.uiNumberOfReferences++; // Incrementing the number of references } // There is an issue with Sprite Batch and disposing textures. // This will have to wait until its figured out. Texture2DResourceDictionary.Add(_textureName, TC); // We don't have the find the reference to the container since we // already have it. temp = TC; } } } // Return a reference to the texture return temp.texture; } /// <summary> /// Go through our dictionary and remove any references to the /// texture passed in. /// </summary> /// <param name="texture">Texture to remove from texture dictionary.</param> public static void RemoveTexture2D(Texture2D texture) { foreach (KeyValuePair<string, TextureContainer> pair in Texture2DResourceDictionary) { // Do our references match? if (pair.Value.texture == texture) { // Only one object or less holds a reference to the // texture. Logically it should be safe to remove. if (pair.Value.uiNumberOfReferences <= 1) { // Grabing referenc to texture TexturesToDispose.Add(pair.Value.texture); // We are about to release the memory of the texture, // thus we make sure no one else can call this member // in the dictionary. Texture2DResourceDictionary.Remove(pair.Key); // Once we have removed the texture we don't want to create an exception. // So we will stop looking in the list since it has changed. break; } // More than one Object has a reference to this texture. // So we will not be removing it from memory and instead // simply marking down the number of references by 1. else { pair.Value.uiNumberOfReferences--; } } } } /*public static void DisposeTextures() { int Count = TexturesToDispose.Count; // If there are any textures to dispose of. if (Count > 0) { for (int con = 0; con < TexturesToDispose.Count; con++) { // =!THIS REMOVES THE TEXTURE FROM MEMORY!= // This is not like a normal dispose. This will actually // remove the object from memory. Texture2D is inherited // from GraphicsResource which removes it self from // memory on dispose. Very nice for game efficency, // but "dangerous" in managed land. Texture2D Temp = TexturesToDispose[con]; Temp.Dispose(); } // Remove textures we've already disposed of. TexturesToDispose.Clear(); } }*/ /// <summary> /// This loads a 2D texture which represnets a font. /// </summary> /// <param name="_textureName">The name of the font you wish to load.</param> /// <returns>Holds the font data.</returns> public static SpriteFont LoadFont( string _fontName ) { SpriteFont temp = Content.Load<SpriteFont>( _fontName ); return temp; } /// <summary> /// This loads an XML document. /// </summary> /// <param name="_textureName">The name of the XML document you wish to load.</param> /// <returns>Holds the XML data.</returns> public static XmlDocument LoadXML( string _fileName ) { XmlDocument temp = Content.Load<XmlDocument>( _fileName ); return temp; } /// <summary> /// This loads a sound file. /// </summary> /// <param name="_fileName"></param> /// <returns></returns> public static SoundEffect LoadSound( string _fileName ) { SoundEffect temp = Content.Load<SoundEffect>(_fileName); return temp; } } }

Read the article

Search Results

Search found 2207 results on 89 pages for 'nick locking'.

Page 88/89 | < Previous Page | 84 85 86 87 88 89 | Next Page >

- by James Michael Hare

- by sqlworkshops

- by wcoekaer

- by Simon Cooper

- by Alois Kraus

- by Alex

- by Kyle Noland

- by clueless-anon

- by Kyle Noland

- by Aaronaught

- by Aaronaught

- by Roland Bengtsson

- by Vokinneberg

- by Tim Cooper

- by skimania

- by sqlworkshops

- by Dave

- by Dean Richardson

- by Chadd Nervig

- by ??iu

- by dhanke

- by Nathan Jones

- by ??iu

- by Alikar

< Previous Page | 84 85 86 87 88 89 | Next Page >