Search Results

Search found 2156 results on 87 pages for 'weighted average'.

Page 39/87 | < Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46  | Next Page >

  • VPS 512 MB RAM with WordPressMU comes to consumes lots of memory

    - by CAPitalZ
    I have googled for days and gathered all optimization suggestions and tried. My sites are not getting any high hits. May be like 100 hits per day [all my sites combined]. Here are my specs I have 512 MB RAM VPS with burstable 1024 MB. Centos 5 32-bit & cPanel/WHM Apache 2.2 MySQL 5.0 PHP 5.3.2 Here is my Configs I have 2 WordPressMU production sites, and 1 test site my.cnf # The following options will be passed to all MySQL clients [client] #password = your_password port = 3306 socket = /var/lib/mysql/mysql.sock # Here follows entries for some specific programs # The MySQL server [mysqld] port = 3306 socket = /var/lib/mysql/mysql.sock skip-locking skip-bdb skip-innodb key_buffer = 16M max_allowed_packet = 1M table_cache = 64 sort_buffer_size = 512K net_buffer_length = 8K read_buffer_size = 256K read_rnd_buffer_size = 512K myisam_sort_buffer_size = 8M #CAPitalZ thread_cache_size=8 thread_concurrency=4 #query_cache_type=1 #query_cache_limit=1M query_cache_size=16M concurrent_insert=2 low_priority_updates=1 max_connections=50 tmp_table_size=16M max_heap_table_size=16M join_buffer_size=1M interactive_timeout=25 wait_timeout=1000 #connect_timout=10 not able to restart mysql max_connect_errors=10 # Don't listen on a TCP/IP port at all. This can be a security enhancement, # if all processes that need to connect to mysqld run on the same host. # All interaction with mysqld must be made via Unix sockets or named pipes. # Note that using this option without enabling named pipes on Windows # (via the "enable-named-pipe" option) will render mysqld useless! # skip-networking # Disable Federated by default skip-federated # Replication Master Server (default) # binary logging is required for replication log-bin=mysql-bin # required unique id between 1 and 2^32 - 1 # defaults to 1 if master-host is not set # but will not function as a master if omitted server-id = 1 [mysqld_safe] open_files_limit=8192 [mysqldump] quick max_allowed_packet = 16M [mysql] no-auto-rehash # Remove the next comment character if you are not familiar with SQL #safe-updates [isamchk] key_buffer = 20M sort_buffer_size = 20M read_buffer = 2M write_buffer = 2M [myisamchk] key_buffer = 20M sort_buffer_size = 20M read_buffer = 2M write_buffer = 2M [mysqlhotcopy] interactive-timeout httpd.conf I have unselected many modules and recompiled using EasyApache in WHM. Only have the following modules built Deflate Expires Fileprotect Imagemap MPM Prefork Version [default] EAccelerator for PHP Bcmath Calendar CurlSSL [I'm using Curl. But I don't have any https sites] Expat GD [for image cropping] Gettext Imap Mbregex [default] Mbstring [need both Mbregex and Mbstring for utf-8] Mysql of the system MySQL "Improved" extension. Sockets TTF (FreeType) [I'm using custom font] Zlib Under Global Configuration I only have FollowSymLinks enabled I Have TraceEnable, ServerSignature, FileETag OFF ServerTokens ProductOnly DirectoryIndex Priority has index.php as the first one I have removed Clamd [Clam Anti-virus] SpamAssasin is Off Under Tweak Settings Default catch-all/default address behavior for new accounts. This is set to "fail" All stats programs turned off I have eAccelerator installed and checked in phpinfo and its working [Pre VirtualHost Include under WHM] Timeout 20 KeepAlive On MaxKeepAliveRequests 200 KeepAliveTimeout 3 MinSpareServers 1 MaxSpareServers 3 StartServers 1 ServerLimit 50 MaxClients 50 MaxRequestsPerChild 4000 ExtendedStatus Off #ServerType standalone this throws error HostnameLookups Off <Directory "/"> AllowOverride None </Directory> My sites will take ages to load and WHM/CPanel will not even load. adadaa.com/ http://adadaa.net/ kadais.ca/ My average memory consumption is like 1000 MB! [yes always bursting] The process that consumes most CPU and also most memory is mysql But I also get like 15 httpd processes [when its bursting] I already got warning from cpuwatchcheck saying "While processing, the cpu has been maxed out for more than a 6 hour period. The current load/uptime line on the server at the time of this email is 07:00:37 up 11:30, 0 users, load average: 14.64, 16.79, 20.07" I don't know, I have tried switching these config values many different times, but nothing seems to work. Please show some light... Thanks

    Read the article

  • Windows 8.1 Will Start Encrypting Hard Drives By Default: Everything You Need to Know

    - by Chris Hoffman
    Windows 8.1 will automatically encrypt the storage on modern Windows PCs. This will help protect your files in case someone steals your laptop and tries to get at them, but it has important ramifications for data recovery. Previously, “BitLocker” was available on Professional and Enterprise editions of Windows, while “Device Encryption” was available on Windows RT and Windows Phone. Device encryption is included with all editions of Windows 8.1 — and it’s on by default. When Your Hard Drive Will Be Encrypted Windows 8.1 includes “Pervasive Device Encryption.” This works a bit differently from the standard BitLocker feature that has been included in Professional, Enterprise, and Ultimate editions of Windows for the past few versions. Before Windows 8.1 automatically enables Device Encryption, the following must be true: The Windows device “must support connected standby and meet the Windows Hardware Certification Kit (HCK) requirements for TPM and SecureBoot on ConnectedStandby systems.”  (Source) Older Windows PCs won’t support this feature, while new Windows 8.1 devices you pick up will have this feature enabled by default. When Windows 8.1 installs cleanly and the computer is prepared, device encryption is “initialized” on the system drive and other internal drives. Windows uses a clear key at this point, which is removed later when the recovery key is successfully backed up. The PC’s user must log in with a Microsoft account with administrator privileges or join the PC to a domain. If a Microsoft account is used, a recovery key will be backed up to Microsoft’s servers and encryption will be enabled. If a domain account is used, a recovery key will be backed up to Active Directory Domain Services and encryption will be enabled. If you have an older Windows computer that you’ve upgraded to Windows 8.1, it may not support Device Encryption. If you log in with a local user account, Device Encryption won’t be enabled. If you upgrade your Windows 8 device to Windows 8.1, you’ll need to enable device encryption, as it’s off by default when upgrading. Recovering An Encrypted Hard Drive Device encryption means that a thief can’t just pick up your laptop, insert a Linux live CD or Windows installer disc, and boot the alternate operating system to view your files without knowing your Windows password. It means that no one can just pull the hard drive from your device, connect the hard drive to another computer, and view the files. We’ve previously explained that your Windows password doesn’t actually secure your files. With Windows 8.1, average Windows users will finally be protected with encryption by default. However, there’s a problem — if you forget your password and are unable to log in, you’d also be unable to recover your files. This is likely why encryption is only enabled when a user logs in with a Microsoft account (or connects to a domain). Microsoft holds a recovery key, so you can gain access to your files by going through a recovery process. As long as you’re able to authenticate using your Microsoft account credentials — for example, by receiving an SMS message on the cell phone number connected to your Microsoft account — you’ll be able to recover your encrypted data. With Windows 8.1, it’s more important than ever to configure your Microsoft account’s security settings and recovery methods so you’ll be able to recover your files if you ever get locked out of your Microsoft account. Microsoft does hold the recovery key and would be capable of providing it to law enforcement if it was requested, which is certainly a legitimate concern in the age of PRISM. However, this encryption still provides protection from thieves picking up your hard drive and digging through your personal or business files. If you’re worried about a government or a determined thief who’s capable of gaining access to your Microsoft account, you’ll want to encrypt your hard drive with software that doesn’t upload a copy of your recovery key to the Internet, such as TrueCrypt. How to Disable Device Encryption There should be no real reason to disable device encryption. If nothing else, it’s a useful feature that will hopefully protect sensitive data in the real world where people — and even businesses — don’t enable encryption on their own. As encryption is only enabled on devices with the appropriate hardware and will be enabled by default, Microsoft has hopefully ensured that users won’t see noticeable slow-downs in performance. Encryption adds some overhead, but the overhead can hopefully be handled by dedicated hardware. If you’d like to enable a different encryption solution or just disable encryption entirely, you can control this yourself. To do so, open the PC settings app — swipe in from the right edge of the screen or press Windows Key + C, click the Settings icon, and select Change PC settings. Navigate to PC and devices -> PC info. At the bottom of the PC info pane, you’ll see a Device Encryption section. Select Turn Off if you want to disable device encryption, or select Turn On if you want to enable it — users upgrading from Windows 8 will have to enable it manually in this way. Note that Device Encryption can’t be disabled on Windows RT devices, such as Microsoft’s Surface RT and Surface 2. If you don’t see the Device Encryption section in this window, you’re likely using an older device that doesn’t meet the requirements and thus doesn’t support Device Encryption. For example, our Windows 8.1 virtual machine doesn’t offer Device Encryption configuration options. This is the new normal for Windows PCs, tablets, and devices in general. Where files on typical PCs were once ripe for easy access by thieves, Windows PCs are now encrypted by default and recovery keys are sent to Microsoft’s servers for safe keeping. This last part may be a bit creepy, but it’s easy to imagine average users forgetting their passwords — they’d be very upset if they lost all their files because they had to reset their passwords. It’s also an improvement over Windows PCs being completely unprotected by default.     

    Read the article

  • SPARC T3-1 Record Results Running JD Edwards EnterpriseOne Day in the Life Benchmark with Added Batch Component

    - by Brian
    Using Oracle's SPARC T3-1 server for the application tier and Oracle's SPARC Enterprise M3000 server for the database tier, a world record result was produced running the Oracle's JD Edwards EnterpriseOne applications Day in the Life benchmark run concurrently with a batch workload. The SPARC T3-1 server based result has 25% better performance than the IBM Power 750 POWER7 server even though the IBM result did not include running a batch component. The SPARC T3-1 server based result has 25% better space/performance than the IBM Power 750 POWER7 server as measured by the online component. The SPARC T3-1 server based result is 5x faster than the x86-based IBM x3650 M2 server system when executing the online component of the JD Edwards EnterpriseOne 9.0.1 Day in the Life benchmark. The IBM result did not include a batch component. The SPARC T3-1 server based result has 2.5x better space/performance than the x86-based IBM x3650 M2 server as measured by the online component. The combination of SPARC T3-1 and SPARC Enterprise M3000 servers delivered a Day in the Life benchmark result of 5000 online users with 0.875 seconds of average transaction response time running concurrently with 19 Universal Batch Engine (UBE) processes at 10 UBEs/minute. The solution exercises various JD Edwards EnterpriseOne applications while running Oracle WebLogic Server 11g Release 1 and Oracle Web Tier Utilities 11g HTTP server in Oracle Solaris Containers, together with the Oracle Database 11g Release 2. The SPARC T3-1 server showed that it could handle the additional workload of batch processing while maintaining the same number of online users for the JD Edwards EnterpriseOne Day in the Life benchmark. This was accomplished with minimal loss in response time. JD Edwards EnterpriseOne 9.0.1 takes advantage of the large number of compute threads available in the SPARC T3-1 server at the application tier and achieves excellent response times. The SPARC T3-1 server consolidates the application/web tier of the JD Edwards EnterpriseOne 9.0.1 application using Oracle Solaris Containers. Containers provide flexibility, easier maintenance and better CPU utilization of the server leaving processing capacity for additional growth. A number of Oracle advanced technology and features were used to obtain this result: Oracle Solaris 10, Oracle Solaris Containers, Oracle Java Hotspot Server VM, Oracle WebLogic Server 11g Release 1, Oracle Web Tier Utilities 11g, Oracle Database 11g Release 2, the SPARC T3 and SPARC64 VII+ based servers. This is the first published result running both online and batch workload concurrently on the JD Enterprise Application server. No published results are available from IBM running the online component together with a batch workload. The 9.0.1 version of the benchmark saw some minor performance improvements relative to 9.0. When comparing between 9.0.1 and 9.0 results, the reader should take this into account when the difference between results is small. Performance Landscape JD Edwards EnterpriseOne Day in the Life Benchmark Online with Batch Workload This is the first publication on the Day in the Life benchmark run concurrently with batch jobs. The batch workload was provided by Oracle's Universal Batch Engine. System RackUnits Online Users Resp Time (sec) BatchConcur(# of UBEs) BatchRate(UBEs/m) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII+ (2.86 GHz), Solaris 10 4 5000 0.88 19 10 9.0.1 Resp Time (sec) — Response time of online jobs reported in seconds Batch Concur (# of UBEs) — Batch concurrency presented in the number of UBEs Batch Rate (UBEs/m) — Batch transaction rate in UBEs/minute. JD Edwards EnterpriseOne Day in the Life Benchmark Online Workload Only These results are for the Day in the Life benchmark. They are run without any batch workload. System RackUnits Online Users ResponseTime (sec) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII (2.75 GHz), Solaris 10 4 5000 0.52 9.0.1 IBM Power 750, 1xPOWER7 (3.55 GHz), IBM i7.1 4 4000 0.61 9.0 IBM x3650M2, 2xIntel X5570 (2.93 GHz), OVM 2 1000 0.29 9.0 IBM result from http://www-03.ibm.com/systems/i/advantages/oracle/, IBM used WebSphere Configuration Summary Hardware Configuration: 1 x SPARC T3-1 server 1 x 1.65 GHz SPARC T3 128 GB memory 16 x 300 GB 10000 RPM SAS 1 x Sun Flash Accelerator F20 PCIe Card, 92 GB 1 x 10 GbE NIC 1 x SPARC Enterprise M3000 server 1 x 2.86 SPARC64 VII+ 64 GB memory 1 x 10 GbE NIC 2 x StorageTek 2540 + 2501 Software Configuration: JD Edwards EnterpriseOne 9.0.1 with Tools 8.98.3.3 Oracle Database 11g Release 2 Oracle 11g WebLogic server 11g Release 1 version 10.3.2 Oracle Web Tier Utilities 11g Oracle Solaris 10 9/10 Mercury LoadRunner 9.10 with Oracle Day in the Life kit for JD Edwards EnterpriseOne 9.0.1 Oracle’s Universal Batch Engine - Short UBEs and Long UBEs Benchmark Description JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations. Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and other manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company. The workload consists of online transactions and the UBE workload of 15 short and 4 long UBEs. LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time. The UBE processes workload runs from the JD Enterprise Application server. Oracle's UBE processes come as three flavors: Short UBEs < 1 minute engage in Business Report and Summary Analysis, Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address, Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs. The UBE workload generates large numbers of PDF files reports and log files. The UBE Queues are categorized as the QBATCHD, a single threaded queue for large UBEs, and the QPROCESS queue for short UBEs run concurrently. One of the Oracle Solaris Containers ran 4 Long UBEs, while another Container ran 15 short UBEs concurrently. The mixed size UBEs ran concurrently from the SPARC T3-1 server with the 5000 online users driven by the LoadRunner. Oracle’s UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute. Key Points and Best Practices Two JD Edwards EnterpriseOne Application Servers and two Oracle Fusion Middleware WebLogic Servers 11g R1 coupled with two Oracle Fusion Middleware 11g Web Tier HTTP Server instances on the SPARC T3-1 server were hosted in four separate Oracle Solaris Containers to demonstrate consolidation of multiple application and web servers. See Also SPARC T3-1 oracle.com SPARC Enterprise M3000 oracle.com Oracle Solaris oracle.com JD Edwards EnterpriseOne oracle.com Oracle Database 11g Release 2 Enterprise Edition oracle.com Disclosure Statement Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 6/27/2011.

    Read the article

  • Of C# Iterators and Performance

    - by James Michael Hare
    Some of you reading this will be wondering, "what is an iterator" and think I'm locked in the world of C++.  Nope, I'm talking C# iterators.  No, not enumerators, iterators.   So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well.   Iterators have been around for a bit now, and there are still a bunch of people who don't know what they are or what they do.  I don't know how many times at work I've had a code review on my code and have someone ask me, "what's that yield word do?"   Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> -- I'll post some of the fun ones in a later post.  Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration?   So, to begin, let's look at a couple of methods.  This is a new (albeit contrived) method called Every(...).  The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first).  So Every(2) would return items 0, 2, 4, 6, etc.   Now, if you wanted to write this in the traditional way, you may come up with something like this:       public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)     {         List<T> newList = new List<T>();         int count = 0;           foreach (var i in list)         {             if ((count++ % interval) == 0)             {                 newList.Add(i);             }         }           return newList;     }     So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item.  Pretty straight forward.   The problem?  Well, Every<T>(...) will construct a list containing every nth item whether or not you care.  What happens if you were searching this result for a certain item and find that item after five tries?  You would have generated the rest of the list for nothing.   Enter iterators.  This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for.  This can be very handy if the evaluation itself is expensive or if there's a fair chance you'll never want to fully evaluate a list.   We see this all the time in Linq, where many expressions are chained together to do complex processing on a list.  This would be very expensive if each of these expressions evaluated their entire possible result set on call.    Let's look at the same example function, this time using an iterator:       public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)     {         int count = 0;         foreach (var i in list)         {             if ((count++ % interval) == 0)             {                 yield return i;             }         }     }   Notice it does not create a new return value explicitly, the only evidence of a return is the "yield return" statement.  What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends.   Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time.   It doesn't seem like a big deal, does it?  But keep in mind the key point here: it only returns items as they are requested. Thus if there's a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit.   This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained.    For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10.  We could write that as:       List<int> someList = new List<int>();         // fill list here         someList.Every(10).TakeWhile(i => i <= 10);     Now is the difference more apparent?  If we use the first form of Every that makes a copy of the list.  It's going to copy the entire list whether we will need those items or not, that can be costly!    With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated.   So, sounds neat eh?  But what's the cost is what you're probably wondering.  So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things.    Now, iteration isn't free.  If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead:   Copy vs Iterator on 100% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 5 Copy 5 5 5 Iterator 5 50 50 Copy 28 50 50 Iterator 27 500 500 Copy 227 500 500 Iterator 247 5000 5000 Copy 2266 5000 5000 Iterator 2444 50,000 50,000 Copy 24,443 50,000 50,000 Iterator 24,719 500,000 500,000 Copy 250,024 500,000 500,000 Iterator 251,521   Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists.  In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price.  However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect.   However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead.  Let's look at what happens if you stop looking after 80% of the list:   Copy vs Iterator on 80% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 4 Copy 5 5 4 Iterator 5 50 40 Copy 27 50 40 Iterator 23 500 400 Copy 215 500 400 Iterator 200 5000 4000 Copy 2099 5000 4000 Iterator 1962 50,000 40,000 Copy 22,385 50,000 40,000 Iterator 19,599 500,000 400,000 Copy 236,427 500,000 400,000 Iterator 196,010       Notice that the iterator form is now operating quite a bit faster.  But the savings really add up if you stop on average at 50% (which most searches would typically do):     Copy vs Iterator on 50% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 2 Copy 5 5 2 Iterator 4 50 25 Copy 25 50 25 Iterator 16 500 250 Copy 188 500 250 Iterator 126 5000 2500 Copy 1854 5000 2500 Iterator 1226 50,000 25,000 Copy 19,839 50,000 25,000 Iterator 12,233 500,000 250,000 Copy 208,667 500,000 250,000 Iterator 122,336   Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time.  And this is only for one level deep.  If we are using this in a chain of query expressions it only adds to the savings.   So my recommendation?  If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator.  The cost if they iterate the whole thing does not add much at all -- and if they consume only partially, you reap some really good performance gains.   Next time I'll discuss some of my favorite extensions I've created to make development life a little easier and maintainability a little better.

    Read the article

  • Thread placement policies on NUMA systems - update

    - by Dave
    In a prior blog entry I noted that Solaris used a "maximum dispersal" placement policy to assign nascent threads to their initial processors. The general idea is that threads should be placed as far away from each other as possible in the resource topology in order to reduce resource contention between concurrently running threads. This policy assumes that resource contention -- pipelines, memory channel contention, destructive interference in the shared caches, etc -- will likely outweigh (a) any potential communication benefits we might achieve by packing our threads more densely onto a subset of the NUMA nodes, and (b) benefits of NUMA affinity between memory allocated by one thread and accessed by other threads. We want our threads spread widely over the system and not packed together. Conceptually, when placing a new thread, the kernel picks the least loaded node NUMA node (the node with lowest aggregate load average), and then the least loaded core on that node, etc. Furthermore, the kernel places threads onto resources -- sockets, cores, pipelines, etc -- without regard to the thread's process membership. That is, initial placement is process-agnostic. Keep reading, though. This description is incorrect. On Solaris 10 on a SPARC T5440 with 4 x T2+ NUMA nodes, if the system is otherwise unloaded and we launch a process that creates 20 compute-bound concurrent threads, then typically we'll see a perfect balance with 5 threads on each node. We see similar behavior on an 8-node x86 x4800 system, where each node has 8 cores and each core is 2-way hyperthreaded. So far so good; this behavior seems in agreement with the policy I described in the 1st paragraph. I recently tried the same experiment on a 4-node T4-4 running Solaris 11. Both the T5440 and T4-4 are 4-node systems that expose 256 logical thread contexts. To my surprise, all 20 threads were placed onto just one NUMA node while the other 3 nodes remained completely idle. I checked the usual suspects such as processor sets inadvertently left around by colleagues, processors left offline, and power management policies, but the system was configured normally. I then launched multiple concurrent instances of the process, and, interestingly, all the threads from the 1st process landed on one node, all the threads from the 2nd process landed on another node, and so on. This happened even if I interleaved thread creating between the processes, so I was relatively sure the effect didn't related to thread creation time, but rather that placement was a function of process membership. I this point I consulted the Solaris sources and talked with folks in the Solaris group. The new Solaris 11 behavior is intentional. The kernel is no longer using a simple maximum dispersal policy, and thread placement is process membership-aware. Now, even if other nodes are completely unloaded, the kernel will still try to pack new threads onto the home lgroup (socket) of the primordial thread until the load average of that node reaches 50%, after which it will pick the next least loaded node as the process's new favorite node for placement. On the T4-4 we have 64 logical thread contexts (strands) per socket (lgroup), so if we launch 48 concurrent threads we will find 32 placed on one node and 16 on some other node. If we launch 64 threads we'll find 32 and 32. That means we can end up with our threads clustered on a small subset of the nodes in a way that's quite different that what we've seen on Solaris 10. So we have a policy that allows process-aware packing but reverts to spreading threads onto other nodes if a node becomes too saturated. It turns out this policy was enabled in Solaris 10, but certain bugs suppressed the mixed packing/spreading behavior. There are configuration variables in /etc/system that allow us to dial the affinity between nascent threads and their primordial thread up and down: see lgrp_expand_proc_thresh, specifically. In the OpenSolaris source code the key routine is mpo_update_tunables(). This method reads the /etc/system variables and sets up some global variables that will subsequently be used by the dispatcher, which calls lgrp_choose() in lgrp.c to place nascent threads. Lgrp_expand_proc_thresh controls how loaded an lgroup must be before we'll consider homing a process's threads to another lgroup. Tune this value lower to have it spread your process's threads out more. To recap, the 'new' policy is as follows. Threads from the same process are packed onto a subset of the strands of a socket (50% for T-series). Once that socket reaches the 50% threshold the kernel then picks another preferred socket for that process. Threads from unrelated processes are spread across sockets. More precisely, different processes may have different preferred sockets (lgroups). Beware that I've simplified and elided details for the purposes of explication. The truth is in the code. Remarks: It's worth noting that initial thread placement is just that. If there's a gross imbalance between the load on different nodes then the kernel will migrate threads to achieve a better and more even distribution over the set of available nodes. Once a thread runs and gains some affinity for a node, however, it becomes "stickier" under the assumption that the thread has residual cache residency on that node, and that memory allocated by that thread resides on that node given the default "first-touch" page-level NUMA allocation policy. Exactly how the various policies interact and which have precedence under what circumstances could the topic of a future blog entry. The scheduler is work-conserving. The x4800 mentioned above is an interesting system. Each of the 8 sockets houses an Intel 7500-series processor. Each processor has 3 coherent QPI links and the system is arranged as a glueless 8-socket twisted ladder "mobius" topology. Nodes are either 1 or 2 hops distant over the QPI links. As an aside the mapping of logical CPUIDs to physical resources is rather interesting on Solaris/x4800. On SPARC/Solaris the CPUID layout is strictly geographic, with the highest order bits identifying the socket, the next lower bits identifying the core within that socket, following by the pipeline (if present) and finally the logical thread context ("strand") on the core. But on Solaris on the x4800 the CPUID layout is as follows. [6:6] identifies the hyperthread on a core; bits [5:3] identify the socket, or package in Intel terminology; bits [2:0] identify the core within a socket. Such low-level details should be of interest only if you're binding threads -- a bad idea, the kernel typically handles placement best -- or if you're writing NUMA-aware code that's aware of the ambient placement and makes decisions accordingly. Solaris introduced the so-called critical-threads mechanism, which is expressed by putting a thread into the FX scheduling class at priority 60. The critical-threads mechanism applies to placement on cores, not on sockets, however. That is, it's an intra-socket policy, not an inter-socket policy. Solaris 11 introduces the Power Aware Dispatcher (PAD) which packs threads instead of spreading them out in an attempt to be able to keep sockets or cores at lower power levels. Maximum dispersal may be good for performance but is anathema to power management. PAD is off by default, but power management polices constitute yet another confounding factor with respect to scheduling and dispatching. If your threads communicate heavily -- one thread reads cache lines last written by some other thread -- then the new dense packing policy may improve performance by reducing traffic on the coherent interconnect. On the other hand if your threads in your process communicate rarely, then it's possible the new packing policy might result on contention on shared computing resources. Unfortunately there's no simple litmus test that says whether packing or spreading is optimal in a given situation. The answer varies by system load, application, number of threads, and platform hardware characteristics. Currently we don't have the necessary tools and sensoria to decide at runtime, so we're reduced to an empirical approach where we run trials and try to decide on a placement policy. The situation is quite frustrating. Relatedly, it's often hard to determine just the right level of concurrency to optimize throughput. (Understanding constructive vs destructive interference in the shared caches would be a good start. We could augment the lines with a small tag field indicating which strand last installed or accessed a line. Given that, we could augment the CPU with performance counters for misses where a thread evicts a line it installed vs misses where a thread displaces a line installed by some other thread.)

    Read the article

  • Improved Performance on PeopleSoft Combined Benchmark using SPARC T4-4

    - by Brian
    Oracle's SPARC T4-4 server running Oracle's PeopleSoft HCM 9.1 combined online and batch benchmark achieved a world record 18,000 concurrent users experiencing subsecond response time while executing a PeopleSoft Payroll batch job of 500,000 employees in 32.4 minutes. This result was obtained with a SPARC T4-4 server running Oracle Database 11g Release 2, a SPARC T4-4 server running PeopleSoft HCM 9.1 application server and a SPARC T4-2 server running Oracle WebLogic Server in the web tier. The SPARC T4-4 server running the application tier used Oracle Solaris Zones which provide a flexible, scalable and manageable virtualization environment. The average CPU utilization on the SPARC T4-2 server in the web tier was 17%, on the SPARC T4-4 server in the application tier it was 59%, and on the SPARC T4-4 server in the database tier was 47% (online and batch) leaving significant headroom for additional processing across the three tiers. The SPARC T4-4 server used for the database tier hosted Oracle Database 11g Release 2 using Oracle Automatic Storage Management (ASM) for database files management with I/O performance equivalent to raw devices. Performance Landscape Results are presented for the PeopleSoft HRMS Self-Service and Payroll combined benchmark. The new result with 128 streams shows significant improvement in the payroll batch processing time with little impact on the self-service component response time. PeopleSoft HRMS Self-Service and Payroll Benchmark Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-2 (web) SPARC T4-4 (app) SPARC T4-4 (db) 18,000 0.988 0.539 32.4 128 SPARC T4-2 (web) SPARC T4-4 (app) SPARC T4-4 (db) 18,000 0.944 0.503 43.3 64 The following results are for the PeopleSoft HRMS Self-Service benchmark that was previous run. The results are not directly comparable with the combined results because they do not include the payroll component. PeopleSoft HRMS Self-Service 9.1 Benchmark Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-2 (web) SPARC T4-4 (app) 2x SPARC T4-2 (db) 18,000 1.048 0.742 N/A N/A The following results are for the PeopleSoft Payroll benchmark that was previous run. The results are not directly comparable with the combined results because they do not include the self-service component. PeopleSoft Payroll (N.A.) 9.1 - 500K Employees (7 Million SQL PayCalc, Unicode) Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-4 (db) N/A N/A N/A 30.84 96 Configuration Summary Application Configuration: 1 x SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 512 GB memory Oracle Solaris 11 11/11 PeopleTools 8.52 PeopleSoft HCM 9.1 Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031 Java Platform, Standard Edition Development Kit 6 Update 32 Database Configuration: 1 x SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 256 GB memory Oracle Solaris 11 11/11 Oracle Database 11g Release 2 PeopleTools 8.52 Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031 Micro Focus Server Express (COBOL v 5.1.00) Web Tier Configuration: 1 x SPARC T4-2 server with 2 x SPARC T4 processors, 2.85 GHz 256 GB memory Oracle Solaris 11 11/11 PeopleTools 8.52 Oracle WebLogic Server 10.3.4 Java Platform, Standard Edition Development Kit 6 Update 32 Storage Configuration: 1 x Sun Server X2-4 as a COMSTAR head for data 4 x Intel Xeon X7550, 2.0 GHz 128 GB memory 1 x Sun Storage F5100 Flash Array (80 flash modules) 1 x Sun Storage F5100 Flash Array (40 flash modules) 1 x Sun Fire X4275 as a COMSTAR head for redo logs 12 x 2 TB SAS disks with Niwot Raid controller Benchmark Description This benchmark combines PeopleSoft HCM 9.1 HR Self Service online and PeopleSoft Payroll batch workloads to run on a unified database deployed on Oracle Database 11g Release 2. The PeopleSoft HRSS benchmark kit is a Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark where DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published. PeopleSoft HR SS defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consist of 14 scenarios which emulate users performing typical HCM transactions such as viewing paycheck, promoting and hiring employees, updating employee profile and other typical HCM application transactions. All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. This benchmark metric is the weighted average response search/save time for all the transactions. The PeopleSoft 9.1 Payroll (North America) benchmark demonstrates system performance for a range of processing volumes in a specific configuration. This workload represents large batch runs typical of a ERP environment during a mass update. The benchmark measures five application business process run times for a database representing large organization. They are Paysheet Creation, Payroll Calculation, Payroll Confirmation, Print Advice forms, and Create Direct Deposit File. The benchmark metric is the cumulative elapsed time taken to complete the Paysheet Creation, Payroll Calculation and Payroll Confirmation business application processes. The benchmark metrics are taken for each respective benchmark while running simultaneously on the same database back-end. Specifically, the payroll batch processes are started when the online workload reaches steady state (the maximum number of online users) and overlap with online transactions for the duration of the steady state. Key Points and Best Practices Two PeopleSoft Domain sets with 200 application servers each on a SPARC T4-4 server were hosted in 2 separate Oracle Solaris Zones to demonstrate consolidation of multiple application servers, ease of administration and performance tuning. Each Oracle Solaris Zone was bound to a separate processor set, each containing 15 cores (total 120 threads). The default set (1 core from first and third processor socket, total 16 threads) was used for network and disk interrupt handling. This was done to improve performance by reducing memory access latency by using the physical memory closest to the processors and offload I/O interrupt handling to default set threads, freeing up cpu resources for Application Servers threads and balancing application workload across 240 threads. A total of 128 PeopleSoft streams server processes where used on the database node to complete payroll batch job of 500,000 employees in 32.4 minutes. See Also Oracle PeopleSoft Benchmark White Papers oracle.com SPARC T4-2 Server oracle.com OTN SPARC T4-4 Server oracle.com OTN PeopleSoft Enterprise Human Capital Managementoracle.com OTN PeopleSoft Enterprise Human Capital Management (Payroll) oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 oracle.com OTN Disclosure Statement Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 8 November 2012.

    Read the article

  • SQL University: What and why of database testing

    - by Mladen Prajdic
    This is a post for a great idea called SQL University started by Jorge Segarra also famously known as SqlChicken on Twitter. It’s a collection of blog posts on different database related topics contributed by several smart people all over the world. So this week is mine and we’ll be talking about database testing and refactoring. In 3 posts we’ll cover: SQLU part 1 - What and why of database testing SQLU part 2 - What and why of database refactoring SQLU part 2 – Tools of the trade With that out of the way let us sharpen our pencils and get going. Why test a database The sad state of the industry today is that there is very little emphasis on testing in general. Test driven development is still a small niche of the programming world while refactoring is even smaller. The cause of this is the inability of developers to convince themselves and their managers that writing tests is beneficial. At the moment they are mostly viewed as waste of time. This is because the average person (let’s not fool ourselves, we’re all average) is unable to think about lower future costs in relation to little more current work. It’s orders of magnitude easier to know about the current costs in relation to current amount of work. That’s why programmers convince themselves testing is a waste of time. However we have to ask ourselves what tests are really about? Maybe finding bugs? No, not really. If we introduce bugs, we’re likely to write test around those bugs too. But yes we can find some bugs with tests. The main point of tests is to have reproducible repeatability in our systems. By having a code base largely covered by tests we can know with better certainty what a small code change can break in other parts of the system. By having repeatability we can make code changes with confidence, since we know we’ll see what breaks in other tests. And here comes the inability to estimate future costs. By spending just a few more hours writing those tests we’d know instantly what broke where. Imagine we fix a reported bug. We check-in the code, deploy it and the users are happy. Until we get a call 2 weeks later about a certain monthly process has stopped working. What we don’t know is that this process was developed by a long gone coworker and for some reason it relied on that same bug we’ve happily fixed. There’s no way we could’ve known that. We say OK and go in and fix the monthly process. But what we have no clue about is that there’s this ETL job that relied on data from that monthly process. Now that we’ve fixed the process it’s giving unexpected (yet correct since we fixed it) data to the ETL job. So we have to fix that too. But there’s this part of the app we coded that relies on data from that exact ETL job. And just like that we enter the “Loop of maintenance horror”. With the loop eventually comes blame. Here’s a nice tip for all developers and DBAs out there: If you make a mistake man up and admit to it. All of the above is valid for any kind of software development. Keeping this in mind the database is nothing other than just a part of the application. But a big part! One reason why testing a database is even more important than testing an application is that one database is usually accessed from multiple applications and processes. This makes it the central and vital part of the enterprise software infrastructure. Knowing all this can we really afford not to have tests? What to test in a database Now that we’ve decided we’ll dive into this testing thing we have to ask ourselves what needs to be tested? The short answer is: everything. The long answer is: read on! There are 2 main ways of doing tests: Black box and White box testing. Black box testing means we have no idea how the system internals are built and we only have access to it’s inputs and outputs. With it we test that the internal changes to the system haven’t caused the input/output behavior of the system to change. The most important thing to test here are the edge conditions. It’s where most programs break. Having good edge condition tests we can be more confident that the systems changes won’t break. White box testing has the full knowledge of the system internals. With it we test the internal system changes, different states of the application, etc… White and Black box tests should be complementary to each other as they are very much interconnected. Testing database routines includes testing stored procedures, views, user defined functions and anything you use to access the data with. Database routines are your input/output interface to the database system. They count as black box testing. We test then for 2 things: Data and schema. When testing schema we only care about the columns and the data types they’re returning. After all the schema is the contract to the out side systems. If it changes we usually have to change the applications accessing it. One helpful T-SQL command when doing schema tests is SET FMTONLY ON. It tells the SQL Server to return only empty results sets. This speeds up tests because it doesn’t return any data to the client. After we’ve validated the schema we have to test the returned data. There no other way to do this but to have expected data known before the tests executes and comparing that data to the database routine output. Testing Authentication and Authorization helps us validate who has access to the SQL Server box (Authentication) and who has access to certain database objects (Authorization). For desktop applications and windows authentication this works well. But the biggest problem here are web apps. They usually connect to the database as a single user. Please ensure that that user is not SA or an account with admin privileges. That is just bad. Load testing ensures us that our database can handle peak loads. One often overlooked tool for load testing is Microsoft’s OSTRESS tool. It’s part of RML utilities (x86, x64) for SQL Server and can help determine if our database server can handle loads like 100 simultaneous users each doing 10 requests per second. SQL Profiler can also help us here by looking at why certain queries are slow and what to do to fix them.   One particular problem to think about is how to begin testing existing databases. First thing we have to do is to get to know those databases. We can’t test something when we don’t know how it works. To do this we have to talk to the users of the applications accessing the database, run SQL Profiler to see what queries are being run, use existing documentation to decipher all the object relationships, etc… The way to approach this is to choose one part of the database (say a logical grouping of tables that go together) and filter our traces accordingly. Once we’ve done that we move on to the next grouping and so on until we’ve covered the whole database. Then we move on to the next one. Database Testing is a topic that we can spent many hours discussing but let this be a nice intro to the world of database testing. See you in the next post.

    Read the article

  • HTG Explains: Why Does Rebooting a Computer Fix So Many Problems?

    - by Chris Hoffman
    Ask a geek how to fix a problem you’ve having with your Windows computer and they’ll likely ask “Have you tried rebooting it?” This seems like a flippant response, but rebooting a computer can actually solve many problems. So what’s going on here? Why does resetting a device or restarting a program fix so many problems? And why don’t geeks try to identify and fix problems rather than use the blunt hammer of “reset it”? This Isn’t Just About Windows Bear in mind that this soltion isn’t just limited to Windows computers, but applies to all types of computing devices. You’ll find the advice “try resetting it” applied to wireless routers, iPads, Android phones, and more. This same advice even applies to software — is Firefox acting slow and consuming a lot of memory? Try closing it and reopening it! Some Problems Require a Restart To illustrate why rebooting can fix so many problems, let’s take a look at the ultimate software problem a Windows computer can face: Windows halts, showing a blue screen of death. The blue screen was caused by a low-level error, likely a problem with a hardware driver or a hardware malfunction. Windows reaches a state where it doesn’t know how to recover, so it halts, shows a blue-screen of death, gathers information about the problem, and automatically restarts the computer for you . This restart fixes the blue screen of death. Windows has gotten better at dealing with errors — for example, if your graphics driver crashes, Windows XP would have frozen. In Windows Vista and newer versions of Windows, the Windows desktop will lose its fancy graphical effects for a few moments before regaining them. Behind the scenes, Windows is restarting the malfunctioning graphics driver. But why doesn’t Windows simply fix the problem rather than restarting the driver or the computer itself?  Well, because it can’t — the code has encountered a problem and stopped working completely, so there’s no way for it to continue. By restarting, the code can start from square one and hopefully it won’t encounter the same problem again. Examples of Restarting Fixing Problems While certain problems require a complete restart because the operating system or a hardware driver has stopped working, not every problem does. Some problems may be fixable without a restart, though a restart may be the easiest option. Windows is Slow: Let’s say Windows is running very slowly. It’s possible that a misbehaving program is using 99% CPU and draining the computer’s resources. A geek could head to the task manager and look around, hoping to locate the misbehaving process an end it. If an average user encountered this same problem, they could simply reboot their computer to fix it rather than dig through their running processes. Firefox or Another Program is Using Too Much Memory: In the past, Firefox has been the poster child for memory leaks on average PCs. Over time, Firefox would often consume more and more memory, getting larger and larger and slowing down. Closing Firefox will cause it to relinquish all of its memory. When it starts again, it will start from a clean state without any leaked memory. This doesn’t just apply to Firefox, but applies to any software with memory leaks. Internet or Wi-Fi Network Problems: If you have a problem with your Wi-Fi or Internet connection, the software on your router or modem may have encountered a problem. Resetting the router — just by unplugging it from its power socket and then plugging it back in — is a common solution for connection problems. In all cases, a restart wipes away the current state of the software . Any code that’s stuck in a misbehaving state will be swept away, too. When you restart, the computer or device will bring the system up from scratch, restarting all the software from square one so it will work just as well as it was working before. “Soft Resets” vs. “Hard Resets” In the mobile device world, there are two types of “resets” you can perform. A “soft reset” is simply restarting a device normally — turning it off and then on again. A “hard reset” is resetting its software state back to its factory default state. When you think about it, both types of resets fix problems for a similar reason. For example, let’s say your Windows computer refuses to boot or becomes completely infected with malware. Simply restarting the computer won’t fix the problem, as the problem is with the files on the computer’s hard drive — it has corrupted files or malware that loads at startup on its hard drive. However, reinstalling Windows (performing a “Refresh or Reset your PC” operation in Windows 8 terms) will wipe away everything on the computer’s hard drive, restoring it to its formerly clean state. This is simpler than looking through the computer’s hard drive, trying to identify the exact reason for the problems or trying to ensure you’ve obliterated every last trace of malware. It’s much faster to simply start over from a known-good, clean state instead of trying to locate every possible problem and fix it. Ultimately, the answer is that “resetting a computer wipes away the current state of the software, including any problems that have developed, and allows it to start over from square one.” It’s easier and faster to start from a clean state than identify and fix any problems that may be occurring — in fact, in some cases, it may be impossible to fix problems without beginning from that clean state. Image Credit: Arria Belli on Flickr, DeclanTM on Flickr     

    Read the article

  • Metrics - A little knowledge can be a dangerous thing (or 'Why you're not clever enough to interpret metrics data')

    - by Jason Crease
    At RedGate Software, I work on a .NET obfuscator  called SmartAssembly.  Various features of it use a database to store various things (exception reports, name-mappings, etc.) The user is given the option of using either a SQL-Server database (which requires them to have Microsoft SQL Server), or a Microsoft Access MDB file (which requires nothing). MDB is the default option, but power-users soon switch to using a SQL Server database because it offers better performance and data-sharing. In the fashionable spirit of optimization and metrics, an obvious product-management question is 'Which is the most popular? SQL Server or MDB?' We've collected data about this fact, using our 'Feature-Usage-Reporting' technology (available as part of SmartAssembly) and more recently our 'Application Metrics' technology: Parameter Number of users % of total users Number of sessions Number of usages SQL Server 28 19.0 8115 8115 MDB 114 77.6 1449 1449 (As a disclaimer, please note than SmartAssembly has far more than 132 users . This data is just a selection of one build) So, it would appear that SQL-Server is used by fewer users, but more often. Great. But here's why these numbers are useless to me: Only the original developers understand the data What does a single 'usage' of 'MDB' mean? Does this happen once per run? Once per option change? On clicking the 'Obfuscate Now' button? When running the command-line version or just from the UI version? Each question could skew the data 10-fold either way, and the answers only known by the developer that instrumented the application in the first place. In other words, only the original developer can interpret the data - product-managers cannot interpret the data unaided. Most of the data is from uninterested users About half of people who download and run a free-trial from the internet quit it almost immediately. Only a small fraction use it sufficiently to make informed choices. Since the MDB option is the default one, we don't know how many of those 114 were people CHOOSING to use the MDB, or how many were JUST HAPPENING to use this MDB default for their 20-second trial. This is a problem we see across all our metrics: Are people are using X because it's the default or are they using X because they want to use X? We need to segment the data further - asking what percentage of each percentage meet our criteria for an 'established user' or 'informed user'. You end up spending hours writing sophisticated and dubious SQL queries to segment the data further. Not fun. You can't find out why they used this feature Metrics can answer the when and what, but not the why. Why did people use feature X? If you're anything like me, you often click on random buttons in unfamiliar applications just to explore the feature-set. If we listened uncritically to metrics at RedGate, we would eliminate the most-important and more-complex features which people actually buy the software for, leaving just big buttons on the main page and the About-Box. "Ah, that's interesting!" rather than "Ah, that's actionable!" People do love data. Did you know you eat 1201 chickens in a lifetime? But just 4 cows? Interesting, but useless. Often metrics give you a nice number: '5.8% of users have 3 or more monitors' . But unless the statistic is both SUPRISING and ACTIONABLE, it's useless. Most metrics are collected, reviewed with lots of cooing. and then forgotten. Unless a piece-of-data could change things, it's useless collecting it. People get obsessed with significance levels The first things that lots of people do with this data is do a t-test to get a significance level ("Hey! We know with 99.64% confidence that people prefer SQL Server to MDBs!") Believe me: other causes of error/misinterpretation in your data are FAR more significant than your t-test could ever comprehend. Confirmation bias prevents objectivity If the data appears to match our instinct, we feel satisfied and move on. If it doesn't, we suspect the data and dig deeper, plummeting down a rabbit-hole of segmentation and filtering until we give-up and move-on. Data is only useful if it can change our preconceptions. Do you trust this dodgy data more than your own understanding, knowledge and intelligence?  I don't. There's always multiple plausible ways to interpret/action any data Let's say we segment the above data, and get this data: Post-trial users (i.e. those using a paid version after the 14-day free-trial is over): Parameter Number of users % of total users Number of sessions Number of usages SQL Server 13 9.0 1115 1115 MDB 5 4.2 449 449 Trial users: Parameter Number of users % of total users Number of sessions Number of usages SQL Server 15 10.0 7000 7000 MDB 114 77.6 1000 1000 How do you interpret this data? It's one of: Mostly SQL Server users buy our software. People who can't afford SQL Server tend to be unable to afford or unwilling to buy our software. Therefore, ditch MDB-support. Our MDB support is so poor and buggy that our massive MDB user-base doesn't buy it.  Therefore, spend loads of money improving it, and think about ditching SQL-Server support. People 'graduate' naturally from MDB to SQL Server as they use the software more. Things are fine the way they are. We're marketing the tool wrong. The large number of MDB users represent uninformed downloaders. Tell marketing to aggressively target SQL Server users. To choose an interpretation you need to segment again. And again. And again, and again. Opting-out is correlated with feature-usage Metrics tends to be opt-in. This skews the data even further. Between 5% and 30% of people choose to opt-in to metrics (often called 'customer improvement program' or something like that). Casual trial-users who are uninterested in your product or company are less likely to opt-in. This group is probably also likely to be MDB users. How much does this skew your data by? Who knows? It's not all doom and gloom. There are some things metrics can answer well. Environment facts. How many people have 3 monitors? Have Windows 7? Have .NET 4 installed? Have Japanese Windows? Minor optimizations.  Is the text-box big enough for average user-input? Performance data. How long does our app take to start? How many databases does the average user have on their server? As you can see, questions about who-the-user-is rather than what-the-user-does are easier to answer and action. Conclusion Use SmartAssembly. If not for the metrics (called 'Feature-Usage-Reporting'), then at least for the obfuscation/error-reporting. Data raises more questions than it answers. Questions about environment are the easiest to answer.

    Read the article

  • “Query cost (relative to the batch)” <> Query cost relative to batch

    - by Dave Ballantyne
    OK, so that is quite a contradictory title, but unfortunately it is true that a common misconception is that the query with the highest percentage relative to batch is the worst performing.  Simply put, it is a lie, or more accurately we dont understand what these figures mean. Consider the two below simple queries: SELECT * FROM Person.BusinessEntity JOIN Person.BusinessEntityAddress ON Person.BusinessEntity.BusinessEntityID = Person.BusinessEntityAddress.BusinessEntityID go SELECT * FROM Sales.SalesOrderDetail JOIN Sales.SalesOrderHeader ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID After executing these and looking at the plans, I see this : So, a 13% / 87% split ,  but 13% / 87% of WHAT ? CPU ? Duration ? Reads ? Writes ? or some magical weighted algorithm ?  In a Profiler trace of the two we can find the metrics we are interested in. CPU and duration are well out but what about reads (210 and 1935)? To save you doing the maths, though you are more than welcome to, that’s a 90.2% / 9.8% split.  Close, but no cigar. Lets try a different tact.  Looking at the execution plan the “Estimated Subtree cost” of query 1 is 0.29449 and query 2 its 1.96596.  Again to save you the maths that works out to 13.03% and 86.97%, round those and thats the figures we are after.  But, what is the worrying word there ? “Estimated”.  So these are not “actual”  execution costs,  but what’s the problem in comparing the estimated costs to derive a meaning of “Most Costly”.  Well, in the case of simple queries such as the above , probably not a lot.  In more complicated queries , a fair bit. By modifying the second query to also show the total number of lines on each order SELECT *,COUNT(*) OVER (PARTITION BY Sales.SalesOrderDetail.SalesOrderID) FROM Sales.SalesOrderDetail JOIN Sales.SalesOrderHeader ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID The split in percentages is now 6% / 94% and the profiler metrics are : Even more of a discrepancy. Estimates can be out with actuals for a whole host of reasons,  scalar UDF’s are a particular bug bear of mine and in-fact the cost of a udf call is entirely hidden inside the execution plan.  It always estimates to 0 (well, a very small number). Take for instance the following udf Create Function dbo.udfSumSalesForCustomer(@CustomerId integer) returns money as begin Declare @Sum money Select @Sum= SUM(SalesOrderHeader.TotalDue) from Sales.SalesOrderHeader where CustomerID = @CustomerId return @Sum end If we have two statements , one that fires the udf and another that doesn't: Select CustomerID from Sales.Customer order by CustomerID go Select CustomerID,dbo.udfSumSalesForCustomer(Customer.CustomerID) from Sales.Customer order by CustomerID The costs relative to batch is a 50/50 split, but the has to be an actual cost of firing the udf. Indeed profiler shows us : No where even remotely near 50/50!!!! Moving forward to window framing functionality in SQL Server 2012 the optimizer sees ROWS and RANGE ( see here for their functional differences) as the same ‘cost’ too SELECT SalesOrderDetailID,SalesOrderId, SUM(LineTotal) OVER(PARTITION BY salesorderid ORDER BY Salesorderdetailid RANGE unbounded preceding) from Sales.SalesOrderdetail go SELECT SalesOrderDetailID,SalesOrderId, SUM(LineTotal) OVER(PARTITION BY salesorderid ORDER BY Salesorderdetailid Rows unbounded preceding) from Sales.SalesOrderdetail By now it wont be a great display to show you the Profiler trace reads a *tiny* bit different. So moral of the story, Percentage relative to batch can give a rough ‘finger in the air’ measurement, but dont rely on it as fact.

    Read the article

  • A proposal for #DAX Code Formatting #ssas #powerpivot #tabular

    - by Marco Russo (SQLBI)
    I recently published a set of rules for DAX code formatting. The following is an example of what I obtain: CALCULATE (     SUMX (         Orders,         Orders[Amount]     ),     FILTER (         ALL ( Customers ),         CALCULATE (             COUNTROWS ( Sales ),             ALL ( Calendar[Date] )         ) > 42 + 8 – 25 * ( 3 - 1 )             + 2 – 1 + 2 – 1             + CALCULATE (                   2 + 2 – 2                   + 2 - 2               )             – CALCULATE ( 4 )     ) ) The goal is to improve code readability and I look forward to implement a code formatting feature in DAX Studio. The DAX Editor already supports the rules described in the article. I am also considering whether to add a rule specific for ADDCOLUMNS / SUMMARIZE because I would like to see the “pairs” of arguments to define a column in the same row or with a special indentation rule (DAX expression for a column is indented in the line following the column name). EVALUATE CALCULATETABLE (        CALCULATETABLE (         SUMMARIZE (             Audience,             'Date'[Year],             Individuals[Gender],             Individuals[AgeRange],             "Num of Rows", FORMAT (COUNTROWS (Audience), "#,#"),             "Weighted Mean Age",                 SUMX (Audience, Audience[Weight] * Audience[Age]) / SUM (Audience[Weight])         ),         SUMMARIZE (             BridgeIndividualsTargets,             Individuals[ID_Individual]         ),         Audience[Weight] > 0        ),        Targets[Target] = "Maschi",     'Date'[Year] = 2010,     'Date'[MonthName] = "January" ) I would like to get feedback for that – you can use comments here or comments in original article. Thanks!

    Read the article

  • Fast Data - Big Data's achilles heel

    - by thegreeneman
    At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions and in particular Oracle's NoSQL Database.  Since that time, there have been a large number of request seeking clarification on how the Fast Data software stack works together to deliver on the promise of real-time Big Data solutions.   Fast Data is a software solution stack that deals with one aspect of Big Data, high velocity.   The software in the Fast Data solution stack involves 3 key pieces and their integration:  Oracle Event Processing, Oracle Coherence, Oracle NoSQL Database.   All three of these technologies address a high throughput, low latency data management requirement.   Oracle Event Processing enables continuous query to filter the Big Data fire hose, enable intelligent chained events to real-time service invocation and augments the data stream to provide Big Data enrichment. Extended SQL syntax allows the definition of sliding windows of time to allow SQL statements to look for triggers on events like breach of weighted moving average on a real-time data stream.    Oracle Coherence is a distributed, grid caching solution which is used to provide very low latency access to cached data when the data is too big to fit into a single process, so it is spread around in a grid architecture to provide memory latency speed access.  It also has some special capabilities to deploy remote behavioral execution for "near data" processing.   The Oracle NoSQL Database is designed to ingest simple key-value data at a controlled throughput rate while providing data redundancy in a cluster to facilitate highly concurrent low latency reads.  For example, when large sensor networks are generating data that need to be captured while analysts are simultaneously extracting the data using range based queries for upstream analytics.  Another example might be storing cookies from user web sessions for ultra low latency user profile management, also leveraging that data using holistic MapReduce operations with your Hadoop cluster to do segmented site analysis.  Understand how NoSQL plays a critical role in Big Data capture and enrichment while simultaneously providing a low latency and scalable data management infrastructure thru clustered, always on, parallel processing in a shared nothing architecture. Learn how easily a NoSQL cluster can be deployed to provide essential services in industry specific Fast Data solutions. See these technologies work together in a demonstration highlighting the salient features of these Fast Data enabling technologies in a location based personalization service. The question then becomes how do these things work together to deliver an end to end Fast Data solution.  The answer is that while different applications will exhibit unique requirements that may drive the need for one or the other of these technologies, often when it comes to Big Data you may need to use them together.   You may have the need for the memory latencies of the Coherence cache, but just have too much data to cache, so you use a combination of Coherence and Oracle NoSQL to handle extreme speed cache overflow and retrieval.   Here is a great reference to how these two technologies are integrated and work together.  Coherence & Oracle NoSQL Database.   On the stream processing side, it is similar as with the Coherence case.  As your sliding windows get larger, holding all the data in the stream can become difficult and out of band data may need to be offloaded into persistent storage.  OEP needs an extreme speed database like Oracle NoSQL Database to help it continue to perform for the real time loop while dealing with persistent spill in the data stream.  Here is a great resource to learn more about how OEP and Oracle NoSQL Database are integrated and work together.  OEP & Oracle NoSQL Database.

    Read the article

  • Programming R/Sweave for proper \Sexpr output

    - by deoksu
    Hi I'm having a bit of a problem programming R for Sweave, and the #rstats twitter group often points here, so I thought I'd put this question to the SO crowd. I'm an analyst- not a programmer- so go easy on me my first post. Here's the problem: I am drafting a survey report in Sweave with R and would like to report the marginal returns in line using \Sexpr{}. For example, rather than saying: Only 14% of respondents said 'X'. I want to write the report like this: Only \Sexpr{p.mean(variable)}$\%$ of respondents said 'X'. The problem is that Sweave() converts the results of the expression in \Sexpr{} to a character string, which means that the output from expression in R and the output that appears in my document are different. For example, above I use the function 'p.mean': p.mean<- function (x) {options(digits=1) mmm<-weighted.mean(x, weight=weight, na.rm=T) print(100*mmm) } In R, the output looks like this: p.mean(variable) >14 but when I use \Sexpr{p.mean(variable)}, I get an unrounded character string (in this case: 13.5857142857143) in my document. I have tried to limit the output of my function to 'digits=1' in the global environment, in the function itself, and and in various commands. It only seems to contain what R prints, not the character transformation that is the result of the expression and which eventually prints in the LaTeX file. as.character(p.mean(variable)) >[1] 14 >[1] "13.5857142857143" Does anyone know what I can do to limit the digits printed in the LaTeX file, either by reprogramming the R function or with a setting in Sweave or \Sexpr{}? I'd greatly appreciate any help you can give. Thanks, David

    Read the article

  • Sorting Arrays by More the One Value, and Prioritizing the Sort based on Column data.

    - by Mark Tomlin
    I'm looking for a way to sort an array (we call this a row), with an array of values (that I'll call columns). Each row has columns that must be sorted based on the priority of: timetime, lapcount & timestamp. Each column cotains this information: split1, split2, split3, laptime, lapcount, timestamp. laptime if in hundredths of a second. (1:23.45 or 1 Minute, 23 Seconds & 45 Hundredths is 8345.) Lapcount is a simple unsigned tiny int, or unsigned char. timestamp is unix epoch. The lowest laptime should be at the get a better standing in this sort. Should two peoples laptimes equal, then timestamp will be used to give the better standing in this sort. Should two peoples timestamp equal, then the person with less of a lapcount get's the better standing in this sort. By better standing, I mean closer to the top of the array, closer to the index of zero where it a numerical array. I think the array sorting functions built into php can do this with a callback, I was wondering what the best approch was for a weighted sort like this would be.

    Read the article

  • Sorting Arrays by More the One Value, and Prioritizing the Sort based on Column data.

    - by Mark Tomlin
    I'm looking for a way to sort an array, based on the information in each row, based on the information in certain cells, that I'll call columns. Each row has columns that must be sorted based on the priority of: timetime, lapcount & timestamp. Each column cotains this information: split1, split2, split3, laptime, lapcount, timestamp. laptime if in hundredths of a second. (1:23.45 or 1 Minute, 23 Seconds & 45 Hundredths is 8345.) Lapcount is a simple unsigned tiny int, or unsigned char. timestamp is unix epoch. The lowest laptime should be at the get a better standing in this sort. Should two peoples laptimes equal, then timestamp will be used to give the better standing in this sort. Should two peoples timestamp equal, then the person with less of a lapcount get's the better standing in this sort. By better standing, I mean closer to the top of the array, closer to the index of zero where it a numerical array. I think the array sorting functions built into php can do this with a callback, I was wondering what the best approch was for a weighted sort like this would be.

    Read the article

  • How to recalculate all-pairs shorthest paths on-line if nodes are getting removed?

    - by Pavel Shved
    Latest news about underground bombing made me curious about the following problem. Assume we have a weighted undirected graph, nodes of which are sometimes removed. The problem is to re-calculate shortest paths between all pairs of nodes fast after such removals. With a simple modification of Floyd-Warshall algorithm we can calculate shortest paths between all pairs. These paths may be stored in a table, where shortest[i][j] contains the index of the next node on the shortest path between i and j (or NULL value if there's no path). The algorithm requires O(n³) time to build the table, and eacho query shortest(i,j) takes O(1). Unfortunately, we should re-run this algorithm after each removal. The other alternative is to run graph search on each query. This way each removal takes zero time to update an auxiliary structure (because there's none), but each query takes O(E) time. What algorithm can be used to "balance" the query and update time for all-pairs shortest-paths problem when nodes of the graph are being removed?

    Read the article

  • Algorithm for finding the best routes for food distribution in game

    - by Tautrimas
    Hello, I'm designing a city building game and got into a problem. Imagine Sierra's Caesar III game mechanics: you have many city districts with one market each. There are several granaries over the distance connected with a directed weighted graph. The difference: people (here cars) are units that form traffic jams (here goes the graph weights). Note: in Ceasar game series, people harvested food and stockpiled it in several big granaries, whereas many markets (small shops) took food from the granaries and delivered it to the citizens. The task: tell each district where they should be getting their food from while taking least time and minimizing congestions on the city's roads. Map example Sample diagram Suppose that yellow districts need 7, 7 and 4 apples accordingly. Bluish granaries have 7 and 11 apples accordingly. Suppose edges weights to be proportional to their length. Then, the solution should be something like the gray numbers indicated on the edges. Eg, first district gets 4 apples from the 1st and 3 apples from the 2nd granary, while the last district gets 4 apples from only the 2nd granary. Here, vertical roads are first occupied to the max, and then the remaining workers are sent to the diagonal paths. Question What practical and very fast algorithm should I use? I was looking at some papers (Congestion Games: Optimization in Competition etc.) describing congestion games, but could not get the big picture. Any help is very appreciated! P. S. I can post very little links and no images because of new user restriction.

    Read the article

  • Algorithm To Select Most Popular Places from Database

    - by Russell C.
    We have a website that contains a database of places. For each place our users are able to take one of the follow actions which we record: VIEW - View it's profile RATING - Rate it on a scale of 1-5 stars REVIEW - Review it COMPLETED - Mark that they've been there WISH LIST - Mark that they want to go there FAVORITE - Mark that it's one of their favorites In our database table of places each place contains a count of the number of times each action above was taken as well as the average rating given by users. views ratings avg_rating completed wishlist favorite What we want to be able to do is generate lists of the top places using the above information. Ideally, we would want to be able to generate this list using a relatively simple SQL query without needing to do any legwork to calculate additional fields or stack rank places against one another. That being said, since we only have about 50,000 places we could run a nightly cron job to calculate some fields such as rankings on different categories if it would make a meaningful difference in the overall results of our top places. I'd appreciate if you could make some suggestions on how we should think about bubbling the best places to the top, which criteria we should weight more heavily, and given that information - suggest what the MySQL query would need to look like in order to select the top 10 places. One thing to note is that at this time we are less concerned with the recency of a place being popular - meaning that looking at the aggregate information is fine and that more recent data doesn't need to be weighted more heavily. Thanks in advance for your help & advice!

    Read the article

  • Examples of monoids/semigroups in programming

    - by jkff
    It is well-known that monoids are stunningly ubiquitous in programing. They are so ubiquitous and so useful that I, as a 'hobby project', am working on a system that is completely based on their properties (distributed data aggregation). To make the system useful I need useful monoids :) I already know of these: Numeric or matrix sum Numeric or matrix product Minimum or maximum under a total order with a top or bottom element (more generally, join or meet in a bounded lattice, or even more generally, product or coproduct in a category) Set union Map union where conflicting values are joined using a monoid Intersection of subsets of a finite set (or just set intersection if we speak about semigroups) Intersection of maps with a bounded key domain (same here) Merge of sorted sequences, perhaps with joining key-equal values in a different monoid/semigroup Bounded merge of sorted lists (same as above, but we take the top N of the result) Cartesian product of two monoids or semigroups List concatenation Endomorphism composition. Now, let us define a quasi-property of an operation as a property that holds up to an equivalence relation. For example, list concatenation is quasi-commutative if we consider lists of equal length or with identical contents up to permutation to be equivalent. Here are some quasi-monoids and quasi-commutative monoids and semigroups: Any (a+b = a or b, if we consider all elements of the carrier set to be equivalent) Any satisfying predicate (a+b = the one of a and b that is non-null and satisfies some predicate P, if none does then null; if we consider all elements satisfying P equivalent) Bounded mixture of random samples (xs+ys = a random sample of size N from the concatenation of xs and ys; if we consider any two samples with the same distribution as the whole dataset to be equivalent) Bounded mixture of weighted random samples Which others do exist?

    Read the article

  • A Digg-like rotating homepage of popular content, how to include date as a factor?

    - by Ferdy
    I am building an advanced image sharing web application. As you may expect, users can upload images and others can comments on it, vote on it, and favorite it. These events will determine the popularity of the image, which I capture in a "karma" field. Now I want to create a Digg-like homepage system, showing the most popular images. It's easy, since I already have the weighted Karma score. I just sort on that descendingly to show the 20 most valued images. The part that is missing is time. I do not want extremely popular images to always be on the homepage. I guess an easy solution is to restrict the result set to the last 24 hours. However, I'm also thinking that in order to keep the image rotation occur throughout the day, time can be some kind of variable where its offset has an influence on the image's sorting. Specific questions: Would you recommend the easy scenario (just sort for best images within 24 hours) or the more sophisticated one (use datetime offset as part of the sorting)? If you advise the latter, any help on the mathematical solution to this? Would it be best to run a scheduled service to mark images for the homepage, or would you advise a direct query (I'm using MySQL) As an extra note, the homepage should support paging and on a quiet day should include entries of days before in order to make sure it is always "filled" I'm not asking the community to build this algorithm, just looking for some advise :)

    Read the article

  • Shortest acyclic path on directed cyclic graph with negative weights/cycles

    - by Janathan
    I have a directed graph which has cycles. All edges are weighted, and the weights can be negative. There can be negative cycles. I want to find a path from s to t, which minimizes the total weight on the path. Sure, it can go to negative infinity when negative cycles exist. But what if I disallow cycles in the path (not in the original graph)? That is, once the path leaves a node, it can not enter the node again. This surely avoids the negative infinity problem, but surprisingly no known algorithm is found by a search on Google. The closest is Floyd–Warshall algorithm, but it does not allow negative cycles. Thanks a lot in advance. Edit: I may have generalized my original problem too much. Indeed, I am given a cyclic directed graph with nonnegative edge weights. But in addition, each node has a positive reward too. I want to find a simple path which minimizes (sum of edge weights on the path) - (sum of node rewards covered by the path). This can be surely converted to the question that I posted, but some structure is lost. And some hint from submodular analysis suggests this motivating problem is not NP-hard. Thanks a lot

    Read the article

  • Is there a free, smale-scale, not web-based issue/bug tracking system?

    - by Doc Brown
    I know, there were posts before here on SO before concerning issue or bug tracking systems, like this one, but the given answers point either to commercial systems or web-based systems, which both seem to be oversized for our needs. What I am looking for is a non-commercial tool for a team of 3 to 4 developers, which can be used on an existing fileserver, without the need of installing additional server software like a C/S database or a web server. Some things I expect from such a system: allows to remember bugs (with a priority) and issues / ideas for new features (mostly without a priority) description of the issue, perhaps some additional remarks short info who entered the bug/issue entry one or more tags allowing us to group or filter the list Any suggestions? EDIT: I should have said that, but we are using MS Windows clients, Visual Studio development, Tortoise SVN (the latter works fine without a subversion server). And yes, I am strict on "no server software", since all server based solutions I have seen so far seem much to oversized/heavy weighted/too-much-effort-to-be-worth-it. In fact, if no one has a better idea, we are going to use a spreadsheet, but I can't believe there are no ready-made, light weight solutions.

    Read the article

  • In Python, how to make data members visible to subclasses if not known when initializing an object ?

    - by LB
    The title is a bit long, but it should be pretty straightforward for someone well-aware of python. I'm a python newbie. So, maybe i'm doing things in the wrong way. Suppose I have a class TreeNode class TreeNode(Node): def __init__(self, name, id): Node.__init__(self, name, id) self.children = [] and a subclass with a weight: class WeightedNode(TreeNode): def __init__(self,name, id): TreeNode.__init__(self, name, id) self.weight = 0 So far, i think I'm ok. Now, I want to add an object variable called father in TreeNode so that WeightedNode has also this member. The problem is that I don't know when initializing the object who is going to be the father. I set the father afterwards with this method in TreeNode : def set_father(self, father_node): self.father = father_node The problem is then when i'm trying to access self.father in Weighted: print 'Name %s Father %s '%(self.name, self.father.name) I obtain: AttributeError: WeightedNode instance has no attribute 'father' I thought that I could make father visible by doing something in TreeNode.__init__ but i wasn't able to find what. How can i do that ? Thanks.

    Read the article

  • Spikes of 99% disk activity in Windows 8 Task Manager

    - by Jonathan Chan
    For some reason Windows 8's Task Manager reports spikes of 99% disk activity for hours at a time. Looking at the entries in that column, however, data doesn't seem to be getting written any more quickly than when the disk activity is around 25-50% (which it seem to idle at most of the time). Furthermore, when these 99% disk activity spikes are happening, the average response time reported in the Performance tab becomes 4000-6000ms. Is there a good way to find out what is causing the disk activity? I've tried using Process Explorer, but I said above, the rate at which data is reportedly being written doesn't seem to correspond (Dropbox and Google Chrome are constantly the top two, but the spikes are not dependent on their being open). Thanks in advance for any help. It gets very annoying when the computer stutters to a halt.

    Read the article

  • IE 8 plays sound, Ulead pop-up message appears, crash

    - by benzado
    I'm experiencing a problem on a new PC using Outlook Web Access in Internet Explorer 8. When OWA plays a sound, a message box appears: the about box for Ulead MP3 codec. When I click OK to dismiss the box, I get a message that IE has stopped responding and Windows eventually has to force the browser window closed. This is apparently not an isolated incident, occurring on computers from different manufacturers and on other websites that play sound (such as AOL's Webmail). The only "fix" I've found on discussion boards is to prevent the website from playing sound in the first place. That's not a fix, that's just avoiding the trigger. I'd like to know what's causing this and uninstall it or repair it, so the computer can work like it's supposed to. Since Super User users are smarter than the average bear, I thought I'd have better luck here.

    Read the article

< Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46  | Next Page >