Search Results

Search found 271 results on 11 pages for 'benchmarks'.

Page 10/11 | < Previous Page | 6 7 8 9 10 11 | Next Page >

fastest way to crawl recursive ntfs directories in C++

- by Peter Parker

I have written a small crawler to scan and resort directory structures. It based on dirent(which is a small wrapper around FindNextFileA) In my first benchmarks it is surprisingy slow: around 123473ms for 4500 files(thinkpad t60p local samsung 320 GB 2.5" HD). 121481 files found in 123473 milliseconds Is this speed normal? This is my code: int testPrintDir(std::string strDir, std::string strPattern="*", bool recurse=true){ struct dirent *ent; DIR *dir; dir = opendir (strDir.c_str()); int retVal = 0; if (dir != NULL) { while ((ent = readdir (dir)) != NULL) { if (strcmp(ent->d_name, ".") !=0 && strcmp(ent->d_name, "..") !=0){ std::string strFullName = strDir +"\\"+std::string(ent->d_name); std::string strType = "N/A"; bool isDir = (ent->data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) !=0; strType = (isDir)?"DIR":"FILE"; if ((!isDir)){ //printf ("%s <%s>\n", strFullName.c_str(),strType.c_str());//ent->d_name); retVal++; } if (isDir && recurse){ retVal += testPrintDir(strFullName, strPattern, recurse); } } } closedir (dir); return retVal; } else { /* could not open directory */ perror ("DIR NOT FOUND!"); return -1; } }

Read the article
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

- by EmbeddedProg

I found this article here: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management http://www.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf In the conclusion section, it reads: Comparing runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Garbage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management. So, if my understanding is correct: if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?) Do you know if current (i.e. latest Java VM's and .NET 4.0's) garbage collectors suffer the same problems described in the aforementioned article? Has the performance of modern garbage collectors improved? Thanks.

Read the article
Image resizing efficiency in C# and .NET 3.5

- by Matthew Nichols

I have written a web service to resize user uploaded images and all works correctly from a functional point of view, but it causes CPU usage to spike every time it is used. It is running on Windows Server 2008 64 bit. I have tried compiling to 32 and 64 bit and get about the same results. The heart of the service is this function: private Image CreateReducedImage(Image imgOrig, Size NewSize) { var newBM = new Bitmap(NewSize.Width, NewSize.Height); using (var newGrapics = Graphics.FromImage(newBM)) { newGrapics.CompositingQuality = CompositingQuality.HighSpeed; newGrapics.SmoothingMode = SmoothingMode.HighSpeed; newGrapics.InterpolationMode = InterpolationMode.HighQualityBicubic; newGrapics.DrawImage(imgOrig, new Rectangle(0, 0, NewSize.Width, NewSize.Height)); } return newBM; } I put a profiler on the service and it seemed to indicate the vast majority of the time is spent in the GDI+ library itself and there is not much to be gained in my code. Questions: Am I doing something glaringly inefficient in my code here? It seems to conform to the example I have seen. Are there gains to be had in using libraries other than GDI+? The benchmarks I have seen seem to indicate that GDI+ does well compare to other libraries but I didn't find enough of these to be confident. Are there gains to be had by using "unsafe code" blocks? Please let me know if I have not included enough of the code...I am happy to put as much up as requested but don't want to be obnoxious in the post.

Read the article
How to correctly partition usb flash drive and which filesystem to choose considering wear leveling?

- by random1

Two problems. First one: how to partition the flash drive? I shouldn't need to do this, but I'm no longer sure if my partition is properly aligned since I was forced to delete and create a new partition table after gparted complained when I tried to format the drive from FAT to ext4. The naive answer would be to say "just use default and everything is going to be alright". However if you read the following links you'll know things are not that simple: https://lwn.net/Articles/428584/ and http://linux-howto-guide.blogspot.com/2009/10/increase-usb-flash-drive-write-speed.html Then there is also the issue of cylinders, heads and sectors. Currently I get this: $sfdisk -l -uM /dev/sdd Disk /dev/sdd: 30147 cylinders, 64 heads, 32 sectors/track Warning: The partition table looks like it was made for C/H/S=*/255/63 (instead of 30147/64/32). For this listing I'll assume that geometry. Units = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End MiB #blocks Id System /dev/sdd1 1 30146 30146 30869504 83 Linux $fdisk -l /dev/sdd Disk /dev/sdd: 31.6 GB, 31611420672 bytes 255 heads, 63 sectors/track, 3843 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00010c28 So from my current understanding I should align partitions at 4 MiB (currently it's at 1 MiB). But I still don't know how to set the heads and sectors properly for my device. Second problem: file system. From the benchmarks I saw ext4 provides the best performance, however there is the issue of wear leveling. How can I know that my Transcend JetFlash 700's microcontroller provides for wear leveling? Or will I just be killing my drive faster? I've seen a lot of posts on the web saying don't worry the newer drives already take care of that. But I've never seen a single piece of backed evidence of that and at some point people start mixing SSD with USB flash drives technology. The safe option would be to go for ext2, however a serious of tests that I performed showed horrible performance!!! These values are from a real scenario and not some synthetic test: 42 files: 3,429,415,284 bytes copied to flash drive original fat32: 15.1 MiB/s ext4 after new partition table: 10.2 MiB/s ext2 after new partition table: 1.9 MiB/s Please read the links that I posted above before answering. I would also be interested in answers backed up with some references because a lot is said and re-said but then it lacks facts. Thank you for the help.

Read the article
Linux RAID-0 performance doesn't scale up over 1 GB/s

- by wazoox

I have trouble getting the max throughput out of my setup. The hardware is as follow : dual Quad-Core AMD Opteron(tm) Processor 2376 16 GB DDR2 ECC RAM dual Adaptec 52245 RAID controllers 48 1 TB SATA drives set up as 2 RAID-6 arrays (256KB stripe) + spares. Software : Plain vanilla 2.6.32.25 kernel, compiled for AMD-64, optimized for NUMA; Debian Lenny userland. benchmarks run : disktest, bonnie++, dd, etc. All give the same results. No discrepancy here. io scheduler used : noop. Yeah, no trick here. Up until now I basically assumed that striping (RAID 0) several physical devices should augment performance roughly linearly. However this is not the case here : each RAID array achieves about 780 MB/s write, sustained, and 1 GB/s read, sustained. writing to both RAID arrays simultaneously with two different processes gives 750 + 750 MB/s, and reading from both gives 1 + 1 GB/s. however when I stripe both arrays together, using either mdadm or lvm, the performance is about 850 MB/s writing and 1.4 GB/s reading. at least 30% less than expected! running two parallel writer or reader processes against the striped arrays doesn't enhance the figures, in fact it degrades performance even further. So what's happening here? Basically I ruled out bus or memory contention, because when I run dd on both drives simultaneously, aggregate write speed actually reach 1.5 GB/s and reading speed tops 2 GB/s. So it's not the PCIe bus. I suppose it's not the RAM. It's not the filesystem, because I get exactly the same numbers benchmarking against the raw device or using XFS. And I also get exactly the same performance using either LVM striping and md striping. What's wrong? What's preventing a process from going up to the max possible throughput? Is Linux striping defective? What other tests could I run?

Read the article
Any reason not to disable the Windows pagefile given enough physical RAM?

- by Evgeny

The question of disabling the Windows pagefile has already been discussed quite a bit, for example here and here and here. People continue to upvote answers that say "you should not disable your pagefile even if you have plenty of RAM", but I have yet to see any concrete, verifiable reasons being given for this advice. As far as I can see, if you never need to read from the pagefile (because you have enough RAM) then performance could only be worse with it enabled due to Windows pre-emptively writing to it. At best, performance would be the same. I can't see how it could possibly be improved by writing data you never need to read. So my question is: Assuming that I have enough physical RAM for everything I do, is there any reason I should not disable the pagefile? Let's say the version of Windows is Windows XP x64 SP2 or Windows Server 2003 x64 SP2 (same thing). If it's different for Windows Server 2008 x64 I'd be interested to hear an answer for that as well. I'm looking for specific, objective reasons from good sources, not just opinions. Something like "here are the benchmarks done with and without a pagefile and the results were better with a pagefile, even with enough RAM" or "according to this MS KB article problem X occurs if you disable the pagefile". So far the only reasons I've seen mentioned are: Even if you think you have enough RAM you might run out. OK, but for the purposes of this question, let's just take it as a given that I have enough. Maybe I only ever read my email and I have 16GB RAM. Or 128GB. Or 1TB. Or whatever - but it's enough for 100% of what I do, 100% of the time. Another way to think of it is: if I have x MB physical RAM and y MB pagefile and I never run out of RAM in that configuration, would I not be better off, performance-wise, with x+y MB physical RAM and no pagefile? Windows is "used to" having a paging file and it might not function as reliably (from Understanding the Impact of RAM on Overall System Performance That's rather vague and I find it hard to believe, given that MS has provided the option to disable the pagefile. Windows knows what it's doing better than you. No - it doesn't know that I won't run more programs or load more data, but I do.

Read the article
Lag spikes at full CPU usage, maybe video card

- by Roberts

I am posting this thread in hurry so few things may be missed (I will update tomorrow). My PC specs: Motherboard Name - Gigabyte GA-945PL-S3 CPU Type - DualCore Intel Core 2 Duo E4300, 1800 MHz (9 x 200) OS - Microsoft Windows 7 Ultimate OS Kernel Type - 32-bit OS Version - 6.1.7601 I bougth a new video card one month ago. GeForce 210. I didn't have any problems. I wanted to overclock it, in other words: "Play with it". So I installed Gigabyte EasyBoost from CD and overclocked the GPU 590 + 110 mhz, memory to max to 960mhz from 800mhz. Benchmarks showed a little bit bigger score. Then I overclocked shader clock from 1405 to [..] (don't remeber really). So I was playing Modern Warfare 2 when off sudden computer froze when I wanted to select team, I was afk before that. I had to reset CMOS. After that I had problems with Skype: unread messages and no sound. Then I figured it out that when ever I open EasyBoost - Skype starts to glitch again. Now I use EVGA Precission X. Now after a month, I cleaned computer and closed the case, it was open all the time. I started to overclock GPU clock only (just a bit) because there was no problems that would stop me. So sometimes on heavy CPU load graphics starts to lag. Dragging a window is painful to watch too. Sometimes the screen freezes for 5 to 10 seconds (I can see that hard disk activity is maximal). You may say that CPU fault it is, isn't it? But sometimes lag spikes starts randomly when CPU load is at maximum. All 3 benchmark softwares (PerformanceTest, NovaBench and MSI Kombustor) shows that performance of my video card has dropped about 25%. BUT! CPU score is lower too. I ignored these problems but when I refreshed Windows Experience Index I was shocked. Month before (in latvian language but not so hard to understand): Now (upgraded RAM): This happened when I tried to capture Minecraft with Fraps on underclocked GPU to 580mhz (def: 590mhz): All drivers are up to date. Average CPU temperature from 55°C to 75°C (at 70°C sometimes starts these lag spikes). Video card's tempratures are from 45°C to 60°C (very hard to reach 60°C). So my hope is that the video card is fine, cause this card is very new and I want to upgrade CPU anyways. Aplogies for my mistakes in vocabulary (I am trying to type this as fast I can).

Read the article
Impact of the L3 cache on performance - worth a dual-processor system?

- by Dan Nissenbaum

I will be purchasing a new high-end system, and I would like to have a better sense of whether a dual-processor Xeon system (I am looking at the new, high-end Xeon E5-2687W) might, realistically, provide a noticeable performance improvement due to the doubling of the L3 cache (20 MB per CPU). (This is in addition to the occasional added advantage due to the doubling of cores and RAM.) My usage scenario is, roughly, that I have many background applications running at any time - 3 or 4 data compression/backup applications, a low-impact web server, one or two virtual machines at any given time (usually fairly idle), and perhaps 20 utility programs that utilize a noticeable (but small) portion of the CPU cores. In total, when I am not actively using the computer, about 25% of the total CPU power is utilized in my current i7-970 6-core (12 thread) system. When I am doing routine work, the CPU utilization often exceeds 50%, and occasionally hits 75%-80%. The Xeon E5-2687W is not only a second-generation i7 (so should improve performance for that reason), but also has 8 cores (16 threads), rather than 6 cores. For this reason, I expect to run into the 75% CPU range even less frequently. Nonetheless, the ability to double the cores and the RAM is a consideration. However, in the end, I believe this decision comes down to whether the doubling of the L3 cache will provide a noticeable improvement. There are many benchmarks, and a lot of discussion, regarding CPU power. However, I find very little discussion of L3 cache utilization, and how increases in the L3 cache (such as doubling it with dual processors) affect performance. For example: If there are only two processes running, but each benefits from a large L3 cache (such as might be the case for background processes that frequently scan the file system), perhaps the overall system performance might noticeably improve with dual CPU's - even if only a single core is active on each CPU - due to each process having double the effective L3 cache. I am hoping that someone has a sense of the benefits of increasing (or doubling) the L3 cache size. Note: the CPU I am considering (the Xeon E5-2687W) has 20 MB L3 cache, so a system with dual CPU's would have 40 MB L3 cache.

Read the article
Tips for maximizing Nginx requests/sec?

- by linkedlinked

I'm building an analytics package, and project requirements state that I need to support 1 billion hits per day. Yep, "billion". In other words, no less than 12,000 hits per second sustained, and preferably some room to burst. I know I'll need multiple servers for this, but I'm trying to get maximum performance out of each node before "throwing more hardware at it". Right now, I have the hits-tracking portion completed, and well optimized. I pretty much just save the requests straight into Redis (for later processing with Hadoop). The application is Python/Django with a gunicorn for the gateway. My 2GB Ubuntu 10.04 Rackspace server (not a production machine) can serve about 1200 static files per second (benchmarked using Apache AB against a single static asset). To compare, if I swap out the static file link with my tracking link, I still get about 600 requests per second -- I think this means my tracker is well optimized, because it's only a factor of 2 slower than serving static assets. However, when I benchmark with millions of hits, I notice a few things -- No disk usage -- this is expected, because I've turned off all Nginx logs, and my custom code doesn't do anything but save the request details into Redis. Non-constant memory usage -- Presumably due to Redis' memory managing, my memory usage will gradually climb up and then drop back down, but it's never once been my bottleneck. System load hovers around 2-4, the system is still responsive during even my heaviest benchmarks, and I can still manually view http://mysite.com/tracking/pixel with little visible delay while my (other) server performs 600 requests per second. If I run a short test, say 50,000 hits (takes about 2m), I get a steady, reliable 600 requests per second. If I run a longer test (tried up to 3.5m so far), my r/s degrades to about 250. My questions -- a. Does it look like I'm maxing out this server yet? Is 1,200/s static files nginx performance comparable to what others have experienced? b. Are there common nginx tunings for such high-volume applications? I have worker threads set to 64, and gunicorn worker threads set to 8, but tweaking these values doesn't seem to help or harm me much. c. Are there any linux-level settings that could be limiting my incoming connections? d. What could cause my performance to degrade to 250r/s on long-running tests? Again, the memory is not maxing out during these tests, and HDD use is nil. Thanks in advance, all :)

Read the article
Choosing parts for a high-spec custom PC - feedback required [closed]

- by James

I'm looking to build a high-spec PC costing under ~£800 (bearing in mind I can get the CPU half price). This is my first time doing this so I have plenty of questions! I have been doing lots of research and this is what I have come up with: http://pcpartpicker.com/uk/p/j4lE Usage: I will be using it for Adobe CS6, rendering in 3DS Max, particle simulations in Realflow and for playing games like GTA IV (and V when it comes out), Crysis 1/2, Saints Row The Third, Deus Ex HR, etc. Questions: Can you see any obvious problem areas with the current setup? Will it be sufficient for the above usage? I won't be doing any overclocking initially. Is it worth buying the H60 liquid cooler, or will the fan that comes with the CPU be sufficient? Is water cooling generally quieter? Is the chosen motherboard good for the current components? And is it future-proof? I read that the HDD is often the bottleneck when it comes to gaming. I presume this is true to other high-end applications? If so, is my selection good? I keep changing my mind about the GPU; first the 560, now the 660. Can anyone shed some light on how to choose? I read mixed opinions about matching the GPU to the CPU. Will the 560 or the 660 be sufficient for my required usage? Atm I'm basing my choice on the PassMark benchmarks and how much they cost. The specs on the GeForce website state that the 560 and the 660 both require 450W. Is this a good figure to base the wattage of my PSU on? If so, how do you decide? Do I really need 750W? The latest GTX 690 requires 650W. Is it a good idea to buy a 750W PSU now to future-proof myself?

Read the article
C# HashSet<T>

- by Ben Griswold

I hadn’t done much (read: anything) with the C# generic HashSet until I recently needed to produce a distinct collection. As it turns out, HashSet<T> was the perfect tool. As the following snippet demonstrates, this collection type offers a lot: // Using HashSet<T>: // http://www.albahari.com/nutshell/ch07.aspx var letters = new HashSet<char>("the quick brown fox"); Console.WriteLine(letters.Contains('t')); // true Console.WriteLine(letters.Contains('j')); // false foreach (char c in letters) Console.Write(c); // the quickbrownfx Console.WriteLine(); letters = new HashSet<char>("the quick brown fox"); letters.IntersectWith("aeiou"); foreach (char c in letters) Console.Write(c); // euio Console.WriteLine(); letters = new HashSet<char>("the quick brown fox"); letters.ExceptWith("aeiou"); foreach (char c in letters) Console.Write(c); // th qckbrwnfx Console.WriteLine(); letters = new HashSet<char>("the quick brown fox"); letters.SymmetricExceptWith("the lazy brown fox"); foreach (char c in letters) Console.Write(c); // quicklazy Console.WriteLine(); The MSDN documentation is a bit light on HashSet<T> documentation but if you search hard enough you can find some interesting information and benchmarks. But back to that distinct list I needed… // MSDN Add // http://msdn.microsoft.com/en-us/library/bb353005.aspx var employeeA = new Employee {Id = 1, Name = "Employee A"}; var employeeB = new Employee {Id = 2, Name = "Employee B"}; var employeeC = new Employee {Id = 3, Name = "Employee C"}; var employeeD = new Employee {Id = 4, Name = "Employee D"}; var naughty = new List<Employee> {employeeA}; var nice = new List<Employee> {employeeB, employeeC}; var employees = new HashSet<Employee>(); naughty.ForEach(x => employees.Add(x)); nice.ForEach(x => employees.Add(x)); foreach (Employee e in employees) Console.WriteLine(e); // Returns Employee A Employee B Employee C The Add Method returns true on success and, you guessed it, false if the item couldn’t be added to the collection. I’m using the Linq ForEach syntax to add all valid items to the employees HashSet. It works really great. This is just a rough sample, but you may have noticed I’m using Employee, a reference type. Most samples demonstrate the power of the HashSet with a collection of integers which is kind of cheating. With value types you don’t have to worry about defining your own equality members. With reference types, you do. internal class Employee { public int Id { get; set; } public string Name { get; set; } public override string ToString() { return Name; } public bool Equals(Employee other) { if (ReferenceEquals(null, other)) return false; if (ReferenceEquals(this, other)) return true; return other.Id == Id; } public override bool Equals(object obj) { if (ReferenceEquals(null, obj)) return false; if (ReferenceEquals(this, obj)) return true; if (obj.GetType() != typeof (Employee)) return false; return Equals((Employee) obj); } public override int GetHashCode() { return Id; } public static bool operator ==(Employee left, Employee right) { return Equals(left, right); } public static bool operator !=(Employee left, Employee right) { return !Equals(left, right); } } Fortunately, with Resharper, it’s a snap. Click on the class name, ALT+INS and then follow with the handy dialogues. That’s it. Try out the HashSet<T>. It’s good stuff.

Read the article
Whitepaper list for the application framework

- by Rick Finley

We're reposting the list of technical whitepapers for the Oracle ETPM framework (called OUAF, Oracle Utilities Application Framework). These are are available from My Oracle Support at the Doc Id's mentioned below. Some have been updated in the last few months to reflect new advice and new features. This is reposted from the OUAF blog: http://blogs.oracle.com/theshortenspot/entry/whitepaper_list_as_at_november Doc Id Document Title Contents 559880.1 ConfigLab Design Guidelines This whitepaper outlines how to design and implement a data management solution using the ConfigLab facility. This whitepaper currently only applies to the following products: Oracle Utilities Customer Care And Billing Oracle Enterprise Taxation Management Oracle Enterprise Taxation and Policy Management 560367.1 Technical Best Practices for Oracle Utilities Application Framework Based Products Whitepaper summarizing common technical best practices used by partners, implementation teams and customers. 560382.1 Performance Troubleshooting Guideline Series A set of whitepapers on tracking performance at each tier in the framework. The individual whitepapers are as follows: Concepts - General Concepts and Performance Troublehooting processes Client Troubleshooting - General troubleshooting of the browser client with common issues and resolutions. Network Troubleshooting - General troubleshooting of the network with common issues and resolutions. Web Application Server Troubleshooting - General troubleshooting of the Web Application Server with common issues and resolutions. Server Troubleshooting - General troubleshooting of the Operating system with common issues and resolutions. Database Troubleshooting - General troubleshooting of the database with common issues and resolutions. Batch Troubleshooting - General troubleshooting of the background processing component of the product with common issues and resolutions. 560401.1 Software Configuration Management Series A set of whitepapers on how to manage customization (code and data) using the tools provided with the framework. The individual whitepapers are as follows: Concepts - General concepts and introduction. Environment Management - Principles and techniques for creating and managing environments. Version Management - Integration of Version control and version management of configuration items. Release Management - Packaging configuration items into a release. Distribution - Distribution and installation of releases across environments Change Management - Generic change management processes for product implementations. Status Accounting - Status reporting techniques using product facilities. Defect Management - Generic defect management processes for product implementations. Implementing Single Fixes - Discussion on the single fix architecture and how to use it in an implementation. Implementing Service Packs - Discussion on the service packs and how to use them in an implementation. Implementing Upgrades - Discussion on the the upgrade process and common techniques for minimizing the impact of upgrades. 773473.1 Oracle Utilities Application Framework Security Overview A whitepaper summarizing the security facilities in the framework. Now includes references to other Oracle security products supported. 774783.1 LDAP Integration for Oracle Utilities Application Framework based products Updated! A generic whitepaper summarizing how to integrate an external LDAP based security repository with the framework. 789060.1 Oracle Utilities Application Framework Integration Overview A whitepaper summarizing all the various common integration techniques used with the product (with case studies). 799912.1 Single Sign On Integration for Oracle Utilities Application Framework based products A whitepaper outlining a generic process for integrating an SSO product with the framework. 807068.1 Oracle Utilities Application Framework Architecture Guidelines This whitepaper outlines the different variations of architecture that can be considered. Each variation will include advice on configuration and other considerations. 836362.1 Batch Best Practices for Oracle Utilities Application Framework based products This whitepaper outlines the common and best practices implemented by sites all over the world. 856854.1 Technical Best Practices V1 Addendum Addendum to Technical Best Practices for Oracle Utilities Customer Care And Billing V1.x only. 942074.1 XAI Best Practices This whitepaper outlines the common integration tasks and best practices for the Web Services Integration provided by the Oracle Utilities Application Framework. 970785.1 Oracle Identity Manager Integration Overview This whitepaper outlines the principals of the prebuilt intergration between Oracle Utilities Application Framework Based Products and Oracle Identity Manager used to provision user and user group security information. For Fw4.x customers use whitepaper 1375600.1 instead. 1068958.1 Production Environment Configuration Guidelines A whitepaper outlining common production level settings for the products based upon benchmarks and customer feedback. 1177265.1 What's New In Oracle Utilities Application Framework V4? Whitepaper outlining the major changes to the framework since Oracle Utilities Application Framework V2.2. 1290700.1 Database Vault Integration Whitepaper outlining the Database Vault Integration solution provided with Oracle Utilities Application Framework V4.1.0 and above. 1299732.1 BI Publisher Guidelines for Oracle Utilities Application Framework Whitepaper outlining the interface between BI Publisher and the Oracle Utilities Application Framework 1308161.1 Oracle SOA Suite Integration with Oracle Utilities Application Framework based products This whitepaper outlines common design patterns and guidelines for using Oracle SOA Suite with Oracle Utilities Application Framework based products. 1308165.1 MPL Best Practices Oracle Utilities Application Framework This is a guidelines whitepaper for products shipping with the Multi-Purpose Listener. This whitepaper currently only applies to the following products: Oracle Utilities Customer Care And Billing Oracle Enterprise Taxation Management Oracle Enterprise Taxation and Policy Management 1308181.1 Oracle WebLogic JMS Integration with the Oracle Utilities Application Framework This whitepaper covers the native integration between Oracle WebLogic JMS with Oracle Utilities Application Framework using the new Message Driven Bean functionality and real time JMS adapters. 1334558.1 Oracle WebLogic Clustering for Oracle Utilities Application Framework New! This whitepaper covers process for implementing clustering using Oracle WebLogic for Oracle Utilities Application Framework based products. 1359369.1 IBM WebSphere Clustering for Oracle Utilities Application Framework New! This whitepaper covers process for implementing clustering using IBM WebSphere for Oracle Utilities Application Framework based products 1375600.1 Oracle Identity Management Suite Integration with the Oracle Utilities Application Framework New! This whitepaper covers the integration between Oracle Utilities Application Framework and Oracle Identity Management Suite components such as Oracle Identity Manager, Oracle Access Manager, Oracle Adaptive Access Manager, Oracle Internet Directory and Oracle Virtual Directory. 1375615.1 Advanced Security for the Oracle Utilities Application Framework New! This whitepaper covers common security requirements and how to meet those requirements using Oracle Utilities Application Framework native security facilities, security provided with the J2EE Web Application and/or facilities available in Oracle Identity Management Suite.

Read the article
Measuring Usability with Common Industry Format (CIF) Usability Tests

- by Applications User Experience

Sean Rice, Manager, Applications User Experience A User-centered Research and Design Process The Oracle Fusion Applications user experience was five years in the making. The development of this suite included an extensive and comprehensive user experience design process: ethnographic research, low-fidelity workflow prototyping, high fidelity user interface (UI) prototyping, iterative formative usability testing, development feedback and iteration, and sales and customer evaluation throughout the design cycle. However, this process does not stop when our products are released. We conduct summative usability testing using the ISO 25062 Common Industry Format (CIF) for usability test reports as an organizational framework. CIF tests allow us to measure the overall usability of our released products. These studies provide benchmarks that allow for comparisons of a specific product release against previous versions of our product and against other products in the marketplace. What Is a CIF Usability Test? CIF refers to the internationally standardized method for reporting usability test findings used by the software industry. The CIF is based on a formal, lab-based test that is used to benchmark the usability of a product in terms of human performance and subjective data. The CIF was developed and is endorsed by more than 375 software customer and vendor organizations led by the National Institute for Standards and Technology (NIST), a US government entity. NIST sponsored the CIF through the American National Standards Institute (ANSI) and International Organization for Standardization (ISO) standards-making processes. Oracle played a key role in developing the CIF. The CIF report format and metrics are consistent with the ISO 9241-11 definition of usability: “The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.” Our goal in conducting CIF tests is to measure performance and satisfaction of a representative sample of users on a set of core tasks and to help predict how usable a product will be with the larger population of customers. Why Do We Perform CIF Testing? The overarching purpose of the CIF for usability test reports is to promote incorporation of usability as part of the procurement decision-making process for interactive products. CIF provides a common format for vendors to report the methods and results of usability tests to customer organizations, and enables customers to compare the usability of our software to that of other suppliers. CIF also enables us to compare our current software with previous versions of our software. CIF Testing for Fusion Applications Oracle Fusion Applications comprises more than 100 modules in seven different product families. These modules encompass more than 400 task flows and 400 user roles. Due to resource constraints, we cannot perform comprehensive CIF testing across the entire product suite. Therefore, we had to develop meaningful inclusion criteria and work with other stakeholders across the applications development organization to prioritize product areas for testing. Ultimately, we want to test the product areas for which customers might be most interested in seeing CIF data. We also want to build credibility with customers; we need to be able to make the case to current and prospective customers that the product areas tested are representative of the product suite as a whole. Our goal is to test the top use cases for each product. The primary activity in the scoping process was to work with the individual product teams to identify the key products and business process task flows in each product to test. We prioritized these products and flows through a series of negotiations among the user experience managers, product strategy, and product management directors for each of the primary product families within the Oracle Fusion Applications suite (Human Capital Management, Supply Chain Management, Customer Relationship Management, Financials, Projects, and Procurement). The end result of the scoping exercise was a list of 47 proposed CIF tests for the Fusion Applications product suite. Figure 1. A participant completes tasks during a usability test in Oracle’s Usability Labs Fusion Supplier Portal CIF Test The first Fusion CIF test was completed on the Supplier Portal application in July of 2011. Fusion Supplier Portal is part of an integrated suite of Procurement applications that helps supplier companies manage orders, schedules, shipments, invoices, negotiations and payments. The user roles targeted for the usability study were Supplier Account Receivables Specialists and Supplier Sales Representatives, including both experienced and inexperienced users across a wide demographic range. The test specifically focused on the following functionality and features: Manage payments – view payments Manage invoices – view invoice status and create invoices Manage account information – create new contact, review bank account information Manage agreements – find and view agreement, upload agreement lines, confirm status of agreement lines upload Manage purchase orders (PO) – view history of PO, request change to PO, find orders Manage negotiations – respond to request for a quote, check the status of a negotiation response These product areas were selected to represent the most important subset of features and functionality of the flow, in terms of frequency and criticality of use by customers. A total of 20 users participated in the usability study. The results of the Supplier Portal evaluation were favorable and exceeded our expectations. Figure 2. Fusion Supplier Portal Next Studies We plan to conduct two Fusion CIF usability studies per product family over the next nine months. The next product to be tested will be Self-service Procurement. End users are currently being recruited to participate in this usability study, and the test sessions are scheduled to begin during the last week of November.

Read the article
New Replication, Optimizer and High Availability features in MySQL 5.6.5!

- by Rob Young

As the Product Manager for the MySQL database it is always great to announce when the MySQL Engineering team delivers another great product release. As a field DBA and developer it is even better when that release contains improvements and innovation that I know will help those currently using MySQL for apps that range from modest intranet sites to the most highly trafficked web sites on the web. That said, it is my pleasure to take my hat off to MySQL Engineering for today's release of the MySQL 5.6.5 Development Milestone Release ("DMR"). The new highlighted features in MySQL 5.6.5 are discussed here: New Self-Healing Replication ClustersThe 5.6.5 DMR improves MySQL Replication by adding Global Transaction Ids and automated utilities for self-healing Replication clusters. Prior to 5.6.5 this has been somewhat of a pain point for MySQL users with most developing custom solutions or looking to costly, complex third-party solutions for these capabilities. With 5.6.5 these shackles are all but removed by a solution that is included with the GPL version of the database and supporting GPL tools. You can learn all about the details of the great, problem solving Replication features in MySQL 5.6 in Mat Keep's Developer Zone article. New Replication Administration and Failover UtilitiesAs mentioned above, the new Replication features, Global Transaction Ids specifically, are now supported by a set of automated GPL utilities that leverage the new GTIDs to provide administration and manual or auto failover to the most up to date slave (that is the default, but user configurable if needed) in the event of a master failure. The new utilities, along with links to Engineering related blogs, are discussed in detail in the DevZone Article noted above. Better Query Optimization and ThroughputThe MySQL Optimizer team continues to amaze with the latest round of improvements in 5.6.5. Along with much refactoring of the legacy code base, the Optimizer team has improved complex query optimization and throughput by adding these functional improvements: Subquery Optimizations - Subqueries are now included in the Optimizer path for runtime optimization. Better throughput of nested queries enables application developers to simplify and consolidate multiple queries and result sets into a single unit or work. Optimizer now uses CURRENT_TIMESTAMP as default for DATETIME columns - For simplification, this eliminates the need for application developers to assign this value when a column of this type is blank by default. Optimizations for Range based queries - Optimizer now uses ready statistics vs Index based scans for queries with multiple range values. Optimizations for queries using filesort and ORDER BY. Optimization criteria/decision on execution method is done now at optimization vs parsing stage. Print EXPLAIN in JSON format for hierarchical readability and Enterprise tool consumption. You can learn the details about these new features as well all of the Optimizer based improvements in MySQL 5.6 by following the Optimizer team blog. You can download and try the MySQL 5.6.5 DMR here. (look under "Development Releases") Please let us know what you think! The new HA utilities for Replication Administration and Failover are available as part of the MySQL Workbench Community Edition, which you can download here .Also New in MySQL LabsAs has become our tradition when announcing DMRs we also like to provide "Early Access" development features to the MySQL Community via the MySQL Labs. Today is no exception as we are also releasing the following to Labs for you to download, try and let us know your thoughts on where we need to improve:InnoDB Online OperationsMySQL 5.6 now provides Online ADD Index, FK Drop and Online Column RENAME. These operations are non-blocking and will continue to evolve in future DMRs. You can learn the grainy details by following John Russell's blog.InnoDB data access via Memcached API ("NotOnlySQL") - Improved refresh of an earlier feature releaseSimilar to Cluster 7.2, MySQL 5.6 provides direct NotOnlySQL access to InnoDB data via the familiar Memcached API. This provides the ultimate in flexibility for developers who need fast, simple key/value access and complex query support commingled within their applications.Improved Transactional Performance, ScaleThe InnoDB Engineering team has once again under promised and over delivered in the area of improved performance and scale. These improvements are also included in the aggregated Spring 2012 labs release:InnoDB CPU cache performance improvements for modern, multi-core/CPU systems show great promise with internal tests showing: 2x throughput improvement for read only activity 6x throughput improvement for SELECT range Read/Write benchmarks are in progress More details on the above are available here. You can download all of the above in an aggregated "InnoDB 2012 Spring Labs Release" binary from the MySQL Labs. You can also learn more about these improvements and about related fixes to mysys mutex and hash sort by checking out the InnoDB team blog.MySQL 5.6.5 is another installment in what we believe will be the best release of the MySQL database ever. It also serves as a shining example of how the MySQL Engineering team at Oracle leads in MySQL innovation.You can get the overall Oracle message on the MySQL 5.6.5 DMR and Early Access labs features here. As always, thanks for your continued support of MySQL, the #1 open source database on the planet!

Read the article
IBM "per core" comparisons for SPECjEnterprise2010

- by jhenning

I recently stumbled upon a blog entry from Roman Kharkovski (an IBM employee) comparing some SPECjEnterprise2010 results for IBM vs. Oracle. Mr. Kharkovski's blog claims that SPARC delivers half the transactions per core vs. POWER7. Prior to any argument, I should say that my predisposition is to like Mr. Kharkovski, because he says that his blog is intended to be factual; that the intent is to try to avoid marketing hype and FUD tactic; and mostly because he features a picture of himself wearing a bike helmet (me too). Therefore, in a spirit of technical argument, rather than FUD fight, there are a few areas in his comparison that should be discussed. Scaling is not free For any benchmark, if a small system scores 13k using quantity R1 of some resource, and a big system scores 57k using quantity R2 of that resource, then, sure, it's tempting to divide: is 13k/R1 > 57k/R2 ? It is tempting, but not necessarily educational. The problem is that scaling is not free. Building big systems is harder than building small systems. Scoring 13k/R1 on a little system provides no guarantee whatsoever that one can sustain that ratio when attempting to handle more than 4 times as many users. Choosing the denominator radically changes the picture When ratios are used, one can vastly manipulate appearances by the choice of denominator. In this case, lots of choices are available for the resource to be compared (R1 and R2 above). IBM chooses to put cores in the denominator. Mr. Kharkovski provides some reasons for that choice in his blog entry. And yet, it should be noted that the very concept of a core is: arbitrary: not necessarily comparable across vendors; fluid: modern chips shift chip resources in response to load; and invisible: unless you have a microscope, you can't see it. By contrast, one can actually see processor chips with the naked eye, and they are a bit easier to count. If we put chips in the denominator instead of cores, we get: 13161.07 EjOPS / 4 chips = 3290 EjOPS per chip for IBM vs 57422.17 EjOPS / 16 chips = 3588 EjOPS per chip for Oracle The choice of denominator makes all the difference in the appearance. Speaking for myself, dividing by chips just seems to make more sense, because: I can see chips and count them; and I can accurately compare the number of chips in my system to the count in some other vendor's system; and Tthe probability of being able to continue to accurately count them over the next 10 years of microprocessor development seems higher than the probability of being able to accurately and comparably count "cores". SPEC Fair use requirements Speaking as an individual, not speaking for SPEC and not speaking for my employer, I wonder whether Mr. Kharkovski's blog article, taken as a whole, meets the requirements of the SPEC Fair Use rule www.spec.org/fairuse.html section I.D.2. For example, Mr. Kharkovski's footnote (1) begins Results from http://www.spec.org as of 04/04/2013 Oracle SUN SPARC T5-8 449 EjOPS/core SPECjEnterprise2010 (Oracle's WLS best SPECjEnterprise2010 EjOPS/core result on SPARC). IBM Power730 823 EjOPS/core (World Record SPECjEnterprise2010 EJOPS/core result) The questionable tactic, from a Fair Use point of view, is that there is no such metric at the designated location. At www.spec.org, You can find the SPEC metric 57422.17 SPECjEnterprise2010 EjOPS for Oracle and You can also find the SPEC metric 13161.07 SPECjEnterprise2010 EjOPS for IBM. Despite the implication of the footnote, you will not find any mention of 449 nor anything that says 823. SPEC says that you can, under its fair use rule, derive your own values; but it emphasizes: "The context must not give the appearance that SPEC has created or endorsed the derived value." Substantiation and transparency Although SPEC disclaims responsibility for non-SPEC information (section I.E), it says that non-SPEC data and methods should be accurate, should be explained, should be substantiated. Unfortunately, it is difficult or impossible for the reader to independently verify the pricing: Were like units compared to like (e.g. list price to list price)? Were all components (hw, sw, support) included? Were all fees included? Note that when tpc.org shows IBM pricing, there are often items such as "PROCESSOR ACTIVATION" and "MEMORY ACTIVATION". Without the transparency of a detailed breakdown, the pricing claims are questionable. T5 claim for "Fastest Processor" Mr. Kharkovski several times questions Oracle's claim for fastest processor, writing You see, when you publish industry benchmarks, people may actually compare your results to other vendor's results. Well, as we performance people always say, "it depends". If you believe in performance-per-core as the primary way of looking at the world, then yes, the POWER7+ is impressive, spending its chip resources to support up to 32 threads (8 cores x 4 threads). Or, it just might be useful to consider performance-per-chip. Each SPARC T5 chip allows 128 hardware threads to be simultaneously executing (16 cores x 8 threads). The Industry Standard Benchmark that focuses specifically on processor chip performance is SPEC CPU2006. For this very well known and popular benchmark, SPARC T5: provides better performance than both POWER7 and POWER7+, for 1 chip vs. 1 chip, for 8 chip vs. 8 chip, for integer (SPECint_rate2006) and floating point (SPECfp_rate2006), for Peak tuning and for Base tuning. For example, at the 8-chip level, integer throughput (SPECint_rate2006) is: 3750 for SPARC 2170 for POWER7+. You can find the details at the March 2013 BestPerf CPU2006 page SPEC is a trademark of the Standard Performance Evaluation Corporation, www.spec.org. The two specific results quoted for SPECjEnterprise2010 are posted at the URLs linked from the discussion. Results for SPEC CPU2006 were verified at spec.org 1 July 2013, and can be rechecked here.

Read the article
SharePoint 2010 Diagnostic Studio Remote Diag

- by juanlarios

I have had some time this week to try out some tools that I have been meaning to try out. This week I am trying out the SP 2010 Diagnostic Studio. I installed it successfully and tried it on my development evironment. I was able to build a report and a snapshot of the environment. I decided to turn my attention to my Employer's intranet environment. This would allow me to analyze it and measure it against benchmarks. I didn't want to install the Diagnostic studio on the Production Envorinment, lucky for me, the Diagnostic studio can be run remotely, well...kind of. Issue My development environment is a stand alone, full installation of SharePoint 2010 Server. It has Office 2010, SQL 2008 Enterprise, a DC...well you get the point, it's jammed packed! But more importantly it's a stand alone, self contained VM environment. Well Microsoft has instructions as to how to connect remotely with Diagnostic Studio here. The deciving part of this is that the SP2010DS prompts you for credentails. So I thought I was getting the right account to run the reports. I tried all the Power Shell commands in the link above but I still ended up getting the following errors: 06/28/2011 12:50:18 Connecting to remote server failed with the following error message : The WinRM client cannot process the request...If the SPN exists, but CredSSP cannot use Kerberos to validate the identity of the target computer and you still want to allow the delegation of the user credentials to the target computer, use gpedit.msc and look at the following policy: Computer Configuration -> Administrative Templates -> System -> Credentials Delegation -> Allow Fresh Credentials with NTLM-only Server Authentication. Verify that it is enabled and configured with an SPN appropriate for the target computer. For example, for a target computer name "myserver.domain.com", the SPN can be one of the following: WSMAN/myserver.domain.com or WSMAN/*.domain.com. Try the request again after these changes. For more information, see the about_Remote_Troubleshooting Help topic. 06/28/2011 12:54:47 Access to the path '\\<targetserver>\C$\Users\<account logging in>\AppData\Local\Temp' is denied. You might also get an error message like this: The WinRM client cannot process the request. A computer policy does not allow the delegation of the user credentials to the target computer. Explanation After looking at the event logs on the target environment, I noticed that there were a several Security Exceptions. After looking at the specifics around who was denied access, I was able to see the account that was being denied access, it was the client machine administrator account. Well of course that was never going to work!!! After some quick Googling, the last error message above will lead you to edit the Local Group Policy on the client server. And although there are instructions from microsoft around doing this, it really will not work in this scenario. Notice the Description and how it only applices to authentication mentioned? Resolution I can tell you what I did, but I wish there was a better way but I simply don't know if it's duable any other way. Because my development environment had it's own DC, I didn't really want to mess with Kerberos authentication. I would also not be smart to connect that server to the domain, considering it has it's own DC. I ended up installing SharePoint 2010 Diagnostic Studio on another Windows 7 Dev environment I have, and connected the machien to the domain. I ran all the necesary remote credentials commands mentioned here. Those commands add the group policy for you! Once I did this I was able to authenticate properly and I was able to get the reports. Conclusion You can run SharePoint 2010 Diagnostic Studio Remotely but it will require some specific scenarions. A couple of things I should mention is that as far as I understand, SP2010 DS, will install agents on your target environment to run tests and retrieve the data. I was a Farm Administrator, and also a Server Admin on SharePoint Server. I am not 100% sure if you need all those permissions but I that's just what I have to my internal intranet. I deally I would like to have a machine that I can have SharePoint 2010 DIagnostic Studio installed and I can run that against client environments. It appears that I will not be able to do that, unless I enable Kerberos on my Windows 7 Machine now. If you have it installed in the same way I would like to have it, please let me know, I'll keep trying to get what I'm after. Hope this helps someone out there doing the same.

Read the article
Performance triage

- by Dave

Folks often ask me how to approach a suspected performance issue. My personal strategy is informed by the fact that I work on concurrency issues. (When you have a hammer everything looks like a nail, but I'll try to keep this general). A good starting point is to ask yourself if the observed performance matches your expectations. Expectations might be derived from known system performance limits, prototypes, and other software or environments that are comparable to your particular system-under-test. Some simple comparisons and microbenchmarks can be useful at this stage. It's also useful to write some very simple programs to validate some of the reported or expected system limits. Can that disk controller really tolerate and sustain 500 reads per second? To reduce the number of confounding factors it's better to try to answer that question with a very simple targeted program. And finally, nothing beats having familiarity with the technologies that underlying your particular layer. On the topic of confounding factors, as our technology stacks become deeper and less transparent, we often find our own technology working against us in some unexpected way to choke performance rather than simply running into some fundamental system limit. A good example is the warm-up time needed by just-in-time compilers in Java Virtual Machines. I won't delve too far into that particular hole except to say that it's rare to find good benchmarks and methodology for java code. Another example is power management on x86. Power management is great, but it can take a while for the CPUs to throttle up from low(er) frequencies to full throttle. And while I love "turbo" mode, it makes benchmarking applications with multiple threads a chore as you have to remember to turn it off and then back on otherwise short single-threaded runs may look abnormally fast compared to runs with higher thread counts. In general for performance characterization I disable turbo mode and fix the power governor at "performance" state. Another source of complexity is the scheduler, which I've discussed in prior blog entries. Lets say I have a running application and I want to better understand its behavior and performance. We'll presume it's warmed up, is under load, and is an execution mode representative of what we think the norm would be. It should be in steady-state, if a steady-state mode even exists. On Solaris the very first thing I'll do is take a set of "pstack" samples. Pstack briefly stops the process and walks each of the stacks, reporting symbolic information (if available) for each frame. For Java, pstack has been augmented to understand java frames, and even report inlining. A few pstack samples can provide powerful insight into what's actually going on inside the program. You'll be able to see calling patterns, which threads are blocked on what system calls or synchronization constructs, memory allocation, etc. If your code is CPU-bound then you'll get a good sense where the cycles are being spent. (I should caution that normal C/C++ inlining can diffuse an otherwise "hot" method into other methods. This is a rare instance where pstack sampling might not immediately point to the key problem). At this point you'll need to reconcile what you're seeing with pstack and your mental model of what you think the program should be doing. They're often rather different. And generally if there's a key performance issue, you'll spot it with a moderate number of samples. I'll also use OS-level observability tools to lock for the existence of bottlenecks where threads contend for locks; other situations where threads are blocked; and the distribution of threads over the system. On Solaris some good tools are mpstat and too a lesser degree, vmstat. Try running "mpstat -a 5" in one window while the application program runs concurrently. One key measure is the voluntary context switch rate "vctx" or "csw" which reflects threads descheduling themselves. It's also good to look at the user; system; and idle CPU percentages. This can give a broad but useful understanding if your threads are mostly parked or mostly running. For instance if your program makes heavy use of malloc/free, then it might be the case you're contending on the central malloc lock in the default allocator. In that case you'd see malloc calling lock in the stack traces, observe a high csw/vctx rate as threads block for the malloc lock, and your "usr" time would be less than expected. Solaris dtrace is a wonderful and invaluable performance tool as well, but in a sense you have to frame and articulate a meaningful and specific question to get a useful answer, so I tend not to use it for first-order screening of problems. It's also most effective for OS and software-level performance issues as opposed to HW-level issues. For that reason I recommend mpstat & pstack as my the 1st step in performance triage. If some other OS-level issue is evident then it's good to switch to dtrace to drill more deeply into the problem. Only after I've ruled out OS-level issues do I switch to using hardware performance counters to look for architectural impediments.

Read the article
Why won't USB 3.0 external hard drive run at USB 3.0 speeds?

- by jgottula

I recently purchased a PCI Express x1 USB 3.0 controller card (containing the NEC USB 3.0 controller) with the intent of using a USB 3.0 external hard drive with my Linux box. I installed the card in an empty PCIe slot on my motherboard, connected the card to a power cable, strung a USB 3.0 cable between one of the new ports and my external HDD, and connected the HDD to a wall socket for power. Booting the system, the drive works 100% as intended, with the one exception of throughput: rather than using SuperSpeed 4.8 Gbps connectivity, it seems to be falling back to High Speed 480 Mbps USB 2.0-style throughput. Disk Utility shows it as a 480 Mbps device, and running a couple Disk Utility and dd benchmarks confirms that the drive fails to exceed ~40 MB/s (the approximate limit of USB 2.0), despite it being an SSD capable of far more than that. When I connect my USB 3.0 HDD, dmesg shows this: [ 3923.280018] usb 3-2: new high speed USB device using ehci_hcd and address 6 where I would expect to find this: [ 3923.280018] usb 3-2: new SuperSpeed USB device using xhci_hcd and address 6 My system was running on kernel 2.6.35-25-generic at the time. Then, I stumbled upon this forum thread by an individual who found that a bug, which was present in kernels prior to 2.6.37-rc5, could be the culprit for this type of problem. Consequently, I installed the 2.6.37-generic mainline Ubuntu kernel to determine if the problem would go away. It didn't, so I tried 2.6.38-rc3-generic, and even the 2.6.38 nightly from 2010.02.01, to no avail. In short, I'm trying to determine why, with USB 3.0 support in the kernel, my USB 3.0 drive fails to run at full SuperSpeed throughput. See the comments under this question for additional details. Output that might be relevant to the problem (when booting from 2.6.38-rc3): Relevant lines from dmesg: [ 19.589491] xhci_hcd 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 [ 19.589512] xhci_hcd 0000:03:00.0: setting latency timer to 64 [ 19.589516] xhci_hcd 0000:03:00.0: xHCI Host Controller [ 19.589623] xhci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 12 [ 19.650492] xhci_hcd 0000:03:00.0: irq 17, io mem 0xf8100000 [ 19.650556] xhci_hcd 0000:03:00.0: irq 47 for MSI/MSI-X [ 19.650560] xhci_hcd 0000:03:00.0: irq 48 for MSI/MSI-X [ 19.650563] xhci_hcd 0000:03:00.0: irq 49 for MSI/MSI-X [ 19.653946] xHCI xhci_add_endpoint called for root hub [ 19.653948] xHCI xhci_check_bandwidth called for root hub Relevant section of sudo lspci -v: 03:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30) Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at f8100000 (64-bit, non-prefetchable) [size=8K] Capabilities: [50] Power Management version 3 Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+ Capabilities: [90] MSI-X: Enable+ Count=8 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff Capabilities: [150] #18 Kernel driver in use: xhci_hcd Kernel modules: xhci-hcd Relevant section of sudo lsusb -v: Bus 012 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 3 bMaxPacketSize0 9 idVendor 0x1d6b Linux Foundation idProduct 0x0003 3.0 root hub bcdDevice 2.06 iManufacturer 3 Linux 2.6.38-020638rc3-generic xhci_hcd iProduct 2 xHCI Host Controller iSerial 1 0000:03:00.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0004 1x 4 bytes bInterval 12 Hub Descriptor: bLength 9 bDescriptorType 41 nNbrPorts 4 wHubCharacteristic 0x0009 Per-port power switching Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood 10 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 PortPwrCtrlMask 0xff Hub Port Status: Port 1: 0000.0100 power Port 2: 0000.0100 power Port 3: 0000.0100 power Port 4: 0000.0100 power Device Status: 0x0003 Self Powered Remote Wakeup Enabled Full, non-verbose lsusb: Bus 012 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 011 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 010 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 009 Device 003: ID 04d9:0702 Holtek Semiconductor, Inc. Bus 009 Device 002: ID 046d:c068 Logitech, Inc. G500 Laser Mouse Bus 009 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 008 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 006: ID 174c:5106 ASMedia Technology Inc. Bus 003 Device 004: ID 0bda:0151 Realtek Semiconductor Corp. Mass Storage Device (Multicard Reader) Bus 003 Device 002: ID 058f:6366 Alcor Micro Corp. Multi Flash Reader Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 006: ID 1687:0163 Kingmax Digital Inc. Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 002: ID 046d:081b Logitech, Inc. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Full output: full dmesg full lspci full lsusb

Read the article
Fulltext search for django : Mysql not so bad ? (vs sphinx, xapian)

- by Eric

I am studying fulltext search engines for django. It must be simple to install, fast indexing, fast index update, not blocking while indexing, fast search. After reading many web pages, I put in short list : Mysql MYISAM fulltext, djapian/python-xapian, and django-sphinx I did not choose lucene because it seems complex, nor haystack as it has less features than djapian/django-sphinx (like fields weighting). Then I made some benchmarks, to do so, I collected many free books on the net to generate a database table with 1 485 000 records (id,title,body), each record is about 600 bytes long. From the database, I also generated a list of 100 000 existing words and shuffled them to create a search list. For the tests, I made 2 runs on my laptop (4Go RAM, Dual core 2.0Ghz): the first one, just after a server reboot to clear all caches, the second is done juste after in order to test how good are cached results. Here are the "home made" benchmark results : 1485000 records with Title (150 bytes) and body (450 bytes) Mysql 5.0.75/Ubuntu 9.04 Fulltext : ========================================================================== Full indexing : 7m14.146s 1 thread, 1000 searchs with single word randomly taken from database : First run : 0:01:11.553524 next run : 0:00:00.168508 Mysql 5.5.4 m3/Ubuntu 9.04 Fulltext : ========================================================================== Full indexing : 6m08.154s 1 thread, 1000 searchs with single word randomly taken from database : First run : 0:01:11.553524 next run : 0:00:00.168508 1 thread, 100000 searchs with single word randomly taken from database : First run : 9m09s next run : 5m38s 1 thread, 10000 random strings (random strings should not be found in database) : just after the 100000 search test : 0:00:15.007353 1 thread, boolean search : 1000 x (+word1 +word2) First run : 0:00:21.205404 next run : 0:00:00.145098 Djapian Fulltext : ========================================================================== Full indexing : 84m7.601s 1 thread, 1000 searchs with single word randomly taken from database with prefetch : First run : 0:02:28.085680 next run : 0:00:14.300236 python-xapian Fulltext : ========================================================================== 1 thread, 1000 searchs with single word randomly taken from database : First run : 0:01:26.402084 next run : 0:00:00.695092 django-sphinx Fulltext : ========================================================================== Full indexing : 1m25.957s 1 thread, 1000 searchs with single word randomly taken from database : First run : 0:01:30.073001 next run : 0:00:05.203294 1 thread, 100000 searchs with single word randomly taken from database : First run : 12m48s next run : 9m45s 1 thread, 10000 random strings (random strings should not be found in database) : just after the 100000 search test : 0:00:23.535319 1 thread, boolean search : 1000 x (word1 word2) First run : 0:00:20.856486 next run : 0:00:03.005416 As you can see, Mysql is not so bad at all for fulltext search. In addition, its query cache is very efficient. Mysql seems to me a good choice as there is nothing to install (I need just to write a small script to synchronize an Innodb production table to a MyISAM search table) and as I do not really need advanced search feature like stemming etc... Here is the question : What do you think about Mysql fulltext search engine vs sphinx and xapian ?

Read the article
Floating point vs integer calculations on modern hardware

- by maxpenguin

I am doing some performance critical work in C++, and we are currently using integer calculations for problems that are inherently floating point because "its faster". This causes a whole lot of annoying problems and adds a lot of annoying code. Now, I remember reading about how floating point calculations were so slow approximately circa the 386 days, where I believe (IIRC) that there was an optional co-proccessor. But surely nowadays with exponentially more complex and powerful CPUs it makes no difference in "speed" if doing floating point or integer calculation? Especially since the actual calculation time is tiny compared to something like causing a pipeline stall or fetching something from main memory? I know the correct answer is to benchmark on the target hardware, what would be a good way to test this? I wrote two tiny C++ programs and compared their run time with "time" on Linux, but the actual run time is too variable (doesn't help I am running on a virtual server). Short of spending my entire day running hundreds of benchmarks, making graphs etc. is there something I can do to get a reasonable test of the relative speed? Any ideas or thoughts? Am I completely wrong? The programs I used as follows, they are not identical by any means: #include <iostream> #include <cmath> #include <cstdlib> #include <time.h> int main( int argc, char** argv ) { int accum = 0; srand( time( NULL ) ); for( unsigned int i = 0; i < 100000000; ++i ) { accum += rand( ) % 365; } std::cout << accum << std::endl; return 0; } Program 2: #include <iostream> #include <cmath> #include <cstdlib> #include <time.h> int main( int argc, char** argv ) { float accum = 0; srand( time( NULL ) ); for( unsigned int i = 0; i < 100000000; ++i ) { accum += (float)( rand( ) % 365 ); } std::cout << accum << std::endl; return 0; } Thanks in advance!

Read the article
What's the best Scala build system?

- by gatoatigrado

I've seen questions about IDE's here -- Which is the best IDE for Scala development? and What is the current state of tooling for Scala?, but I've had mixed experiences with IDEs. Right now, I'm using the Eclipse IDE with the automatic workspace refresh option, and KDE 4's Kate as my text editor. Here are some of the problems I'd like to solve: use my own editor IDEs are really geared at everyone using their components. I like Kate better, but the refresh system is very annoying (it doesn't use inotify, rather, maybe a 10s polling interval). The reason I don't use the built-in text editor is because broken auto-complete functionalities cause the IDE to hang for maybe 10s. rebuild only modified files The Eclipse build system is broken. It doesn't know when to rebuild classes. I find myself almost half of the time going to project-clean. Worse, it seems even after it has finished building my project, a few minutes later it will pop up with some bizarre error (edit - these errors appear to be things that were previously solved with a project clean, but then come back up...). Finally, setting "Preferences / Continue launch if project contains errors" to "prompt" seems to have no effect for Scala projects (i.e. it always launches even if there are errors). build customization I can use the "nightly" release, but I'll want to modify and use my own Scala builds, not the compiler that's built into the IDE's plugin. It would also be nice to pass [e.g.] -Xprint:jvm to the compiler (to print out lowered code). fast compiling Though Eclipse doesn't always build right, it does seem snappy -- even more so than fsc. I looked at Ant and Maven, though haven't employed either yet (I'll also need to spend time solving #3 and #4). I wanted to see if anyone has other suggestions before I spend time getting a suboptimal build system working. Thanks in advance! UPDATE - I'm now using Maven, passing a project as a compiler plugin to it. It seems fast enough; I'm not sure what kind of jar caching Maven does. A current repository for Scala 2.8.0 is available [link]. The archetypes are very cool, and cross-platform support seems very good. However, about compile issues, I'm not sure if fsc is actually fixed, or my project is stable enough (e.g. class names aren't changing) -- running it manually doesn't bother me as much. If you'd like to see an example, feel free to browse the pom.xml files I'm using [github]. UPDATE 2 - from benchmarks I've seen, Daniel Spiewak is right that buildr's faster than Maven (and, if one is doing incremental changes, Maven's 10 second latency gets annoying), so if one can craft a compatible build file, then it's probably worth it...

Read the article
Can GPU capabilities impact virtual machine performance?

- by Dave White

While this many not seem like a programming question directly, it impacts my development activities and so it seems like it belongs here. It seems that more and more developers are turning to virtual environments for development activities on their computers, SharePoint development being a prime example. Also, as a trainer, I have virtual training environments for all of the classes that I teach. I recently purchased a new Dell E6510 to travel around with. It has the i7 620M (Dual core, HyperThreaded cpu running at 2.66GHz) and 8 GB of memory. Reading the spec sheet, it sounded like it would be a great laptop to carry around and run virtual machines on. Getting the laptop though, I've been pretty disappointed with the user experience of developing in a virtual machine. Giving the Virtual Machine 4 GB of memory, it was slow and I could type complete sentences and watch the VM "catchup". My company has training laptops that we provide for our classes. They are Dell Precision M6400 Intel Core 2 Duo P8700 running at 2.54Ghz with 8 GB of memory and the experience on this laptops is night and day compared to the E6510. They are crisp and you barely aware that you are running in a virtual environment. Since the E6510 should be faster in all categories than the M6400, I couldn't understand why the new laptop was slower, so I did a component by component comparison and the only place where the E6510 is less performant than the M6400 is the graphics department. The M6400 is running a nVidia FX 2700m GPU and the E6510 is running a nVidia 3100M GPU. Looking at benchmarks of the two GPUs suggest that the FX 2700M is twice as fast as the 3100M. http://www.notebookcheck.net/Mobile-Graphics-Cards-Benchmark-List.844.0.html 3100M = 111th (E6510) FX 2700m = 47th (Precision M6400) Radeon HD 5870 = 8th (Alienware) The host OS is Windows 7 64bit as is the guest OS, running in Virtual Box 3.1.8 with Guest Additions installed on the guest. The IDE being used in the virtual environment is VS 2010 Premium. So after that long setup, my question is: Is the GPU significantly impacting the virtual machine's performance or are there other factors that I'm not looking at that I can use to boost the vm's performance? Do we now have to consider GPU performance when purchasing laptops where we expect to use virtualized development environments? Thanks in advance. Cheers, Dave

Read the article
Performance of looping over an Unboxed array in Haskell

- by Joey Adams

First of all, it's great. However, I came across a situation where my benchmarks turned up weird results. I am new to Haskell, and this is first time I've gotten my hands dirty with mutable arrays and Monads. The code below is based on this example. I wrote a generic monadic for function that takes numbers and a step function rather than a range (like forM_ does). I compared using my generic for function (Loop A) against embedding an equivalent recursive function (Loop B). Having Loop A is noticeably faster than having Loop B. Weirder, having both Loop A and B together is faster than having Loop B by itself (but slightly slower than Loop A by itself). Some possible explanations I can think of for the discrepancies. Note that these are just guesses: Something I haven't learned yet about how Haskell extracts results from monadic functions. Loop B faults the array in a less cache efficient manner than Loop A. Why? I made a dumb mistake; Loop A and Loop B are actually different. Note that in all 3 cases of having either or both Loop A and Loop B, the program produces the same output. Here is the code. I tested it with ghc -O2 for.hs using GHC version 6.10.4 . import Control.Monad import Control.Monad.ST import Data.Array.IArray import Data.Array.MArray import Data.Array.ST import Data.Array.Unboxed for :: (Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m () for start end step f = loop start where loop i | i <= end = do f i loop (step i) | otherwise = return () primesToNA :: Int -> UArray Int Bool primesToNA n = runSTUArray $ do a <- newArray (2,n) True :: ST s (STUArray s Int Bool) let sr = floor . (sqrt::Double->Double) . fromIntegral $ n+1 -- Loop A for 4 n (+ 2) $ \j -> writeArray a j False -- Loop B let f i | i <= n = do writeArray a i False f (i+2) | otherwise = return () in f 4 forM_ [3,5..sr] $ \i -> do si <- readArray a i when si $ forM_ [i*i,i*i+i+i..n] $ \j -> writeArray a j False return a primesTo :: Int -> [Int] primesTo n = [i | (i,p) <- assocs . primesToNA $ n, p] main = print $ primesTo 30000000

Read the article
Local Variables take 7x longer to access than global variables?

- by ItzWarty

I was trying to benchmark the gain/loss of "caching" math.floor, in hopes that I could make calls faster. Here was the test: <html> <head> <script> window.onload = function() { var startTime = new Date().getTime(); var k = 0; for(var i = 0; i < 1000000; i++) k += Math.floor(9.99); var mathFloorTime = new Date().getTime() - startTime; startTime = new Date().getTime(); window.mfloor = Math.floor; k = 0; for(var i = 0; i < 1000000; i++) k += window.mfloor(9.99); var globalFloorTime = new Date().getTime() - startTime; startTime = new Date().getTime(); var mfloor = Math.floor; k = 0; for(var i = 0; i < 1000000; i++) k += mfloor(9.99); var localFloorTime = new Date().getTime() - startTime; document.getElementById("MathResult").innerHTML = mathFloorTime; document.getElementById("globalResult").innerHTML = globalFloorTime; document.getElementById("localResult").innerHTML = localFloorTime; }; </script> </head> <body> Math.floor: <span id="MathResult"></span>ms <br /> var mathfloor: <span id="globalResult"></span>ms <br /> window.mathfloor: <span id="localResult"></span>ms <br /> </body> </html> My results from the test: [Chromium 5.0.308.0]: Math.floor: 49ms var mathfloor: 271ms window.mathfloor: 40ms [IE 8.0.6001.18702] Math.floor: 703ms var mathfloor: 9890ms [LOL!] window.mathfloor: 375ms [Firefox [Minefield] 3.7a4pre] Math.floor: 42ms var mathfloor: 2257ms window.mathfloor: 60ms [Safari 4.0.4[531.21.10] ] Math.floor: 92ms var mathfloor: 289ms window.mathfloor: 90ms [Opera 10.10 build 1893] Math.floor: 500ms var mathfloor: 843ms window.mathfloor: 360ms [Konqueror 4.3.90 [KDE 4.3.90 [KDE 4.4 RC1]]] Math.floor: 453ms var mathfloor: 563ms window.mathfloor: 312ms The variance is random, of course, but for the most part In all cases [this shows time taken]: [takes longer] mathfloor Math.floor window.mathfloor [is faster] Why is this? In my projects i've been using var mfloor = Math.floor, and according to my not-so-amazing benchmarks, my efforts to "optimize" actually slowed down the script by ALOT... Is there any other way to make my code more "efficient"...? I'm at the stage where i basically need to optimize, so no, this isn't "premature optimization"...

Read the article
Lag spikes at full CPU usage, lagy mouse, maybe video card

- by Roberts

My PC specs: Motherboard Name - Gigabyte GA-945PL-S3 CPU Type - DualCore Intel Core 2 Duo E4300, 1800 MHz (9 x 200) OS - Microsoft Windows 7 Ultimate OS Kernel Type - 32-bit OS Version - 6.1.7601 I bougth a new video card one month ago. GeForce 210. I didn't have any problems. I wanted to overclock it, in other words: "Play with it". So I installed Gigabyte EasyBoost from CD and overclocked the GPU 590 + 110 mhz, memory to max to 960mhz from 800mhz. Benchmarks showed a little bit bigger score. Then I overclocked shader clock from 1405 to [..] (don't remeber really). So I was playing Modern Warfare 2 when off sudden computer froze when I wanted to select team, I was afk before that. I had to reset CMOS. After that I had problems with Skype: unread messages and no sound. Then I figured it out that when ever I open EasyBoost - Skype starts to glitch again. Now I use EVGA Precission X. Now after a month, I cleaned computer and closed the case, it was open all the time. I started to overclock GPU clock only (just a bit) because there was no problems that would stop me. So sometimes on heavy CPU load graphics starts to lag. Dragging a window is painful to watch too. Sometimes the screen freezes for 5 to 10 seconds (I can see that hard disk activity is maximal). You may say that CPU fault it is, isn't it? But sometimes lag spikes starts randomly when CPU load is at maximum. All 3 benchmark softwares (PerformanceTest, NovaBench and MSI Kombustor) shows that performance of my video card has dropped about 25%. BUT! CPU score is lower too. I ignored these problems but when I refreshed Windows Experience Index I was shocked. Month before (in latvian language but not so hard to understand): Now 01.04.2012 (upgraded RAM): This happened when I tried to capture Minecraft with Fraps on underclocked GPU to 580mhz (def: 590mhz): All drivers are up to date. Average CPU temperature from 55°C to 75°C (at 70°C sometimes starts these lag spikes). Video card's tempratures are from 45°C to 60°C (very hard to reach 60°C). So my hope is that the video card is fine, cause this card is very new and I want to upgrade CPU anyways. Aplogies for my mistakes in vocabulary (I am trying to type this as fast I can). Update 02.04.2012 - 7:21 Forgot one thing, my hard disk is extrimly slow and I will upgrade it this week or next week so I will be installing same OS again. I am multi-tasker but I can't do much because of 1.8 GHz CPU and slow hard drive (Model ID - WDC WD800JD-60JRC0). The Windows Experience Index is back to normal. Actually "Spelu grafika" (Gaming graphics) are higher than month ago. During this test mouse was very lagy, but month ago there weren't any problems. WHY!?

Read the article

Search Results

Search found 271 results on 11 pages for 'benchmarks'.

Page 10/11 | < Previous Page | 6 7 8 9 10 11 | Next Page >

- by Peter Parker

- by EmbeddedProg

- by Matthew Nichols

- by random1

- by wazoox

- by Evgeny

- by Roberts

- by Dan Nissenbaum

- by linkedlinked

- by James

- by Ben Griswold

- by Rick Finley

- by Applications User Experience

- by Rob Young

- by jhenning

- by juanlarios

- by Dave

- by jgottula

- by Eric

- by maxpenguin

- by gatoatigrado

- by Dave White

- by Joey Adams

- by ItzWarty

- by Roberts

< Previous Page | 6 7 8 9 10 11 | Next Page >