processors - Page 6 - Developer IT

Understanding and Controlling Parallel Query Processing in SQL Server

Data warehousing and general reporting applications tend to be CPU intensive because they need to read and process a large number of rows. To facilitate quick data processing for queries that touch a large amount of data, Microsoft SQL Server exploits the power of multiple logical processors to provide parallel query processing operations such as parallel scans. Through extensive testing, we have learned that, for most large queries that are executed in a parallel fashion, SQL Server can deliver linear or nearly linear response time speedup as the number of logical processors increases. However, some queries in high parallelism scenarios perform suboptimally. There are also some parallelism issues that can occur in a multi-user parallel query workload. This white paper describes parallel performance problems you might encounter when you run such queries and workloads, and it explains why these issues occur. In addition, it presents how data warehouse developers can detect these issues, and how they can work around them or mitigate them.

Read the article

HPC Server Dynamic Job Scheduling: when jobs spawn jobs

- by JoshReuben

HPC Job Types HPC has 3 types of jobs http://technet.microsoft.com/en-us/library/cc972750(v=ws.10).aspx · Task Flow – vanilla sequence · Parametric Sweep – concurrently run multiple instances of the same program, each with a different work unit input · MPI – message passing between master & slave tasks But when you try go outside the box – job tasks that spawn jobs, blocking the parent task – you run the risk of resource starvation, deadlocks, and recursive, non-converging or exponential blow-up. The solution to this is to write some performance monitoring and job scheduling code. You can do this in 2 ways: manually control scheduling - allocate/ de-allocate resources, change job priorities, pause & resume tasks , restrict long running tasks to specific compute clusters Semi-automatically - set threshold params for scheduling. How – Control Job Scheduling In order to manage the tasks and resources that are associated with a job, you will need to access the ISchedulerJob interface - http://msdn.microsoft.com/en-us/library/microsoft.hpc.scheduler.ischedulerjob_members(v=vs.85).aspx This really allows you to control how a job is run – you can access & tweak the following features: max / min resource values whether job resources can grow / shrink, and whether jobs can be pre-empted, whether the job is exclusive per node the creator process id & the job pool timestamp of job creation & completion job priority, hold time & run time limit Re-queue count Job progress Max/ min Number of cores, nodes, sockets, RAM Dynamic task list – can add / cancel jobs on the fly Job counters When – poll perf counters Tweaking the job scheduler should be done on the basis of resource utilization according to PerfMon counters – HPC exposes 2 Perf objects: Compute Clusters, Compute Nodes http://technet.microsoft.com/en-us/library/cc720058(v=ws.10).aspx You can monitor running jobs according to dynamic thresholds – use your own discretion: Percentage processor time Number of running jobs Number of running tasks Total number of processors Number of processors in use Number of processors idle Number of serial tasks Number of parallel tasks Design Your algorithms correctly Finally , don’t assume you have unlimited compute resources in your cluster – design your algorithms with the following factors in mind: · Branching factor - http://en.wikipedia.org/wiki/Branching_factor - dynamically optimize the number of children per node · cutoffs to prevent explosions - http://en.wikipedia.org/wiki/Limit_of_a_sequence - not all functions converge after n attempts. You also need a threshold of good enough, diminishing returns · heuristic shortcuts - http://en.wikipedia.org/wiki/Heuristic - sometimes an exhaustive search is impractical and short cuts are suitable · Pruning http://en.wikipedia.org/wiki/Pruning_(algorithm) – remove / de-prioritize unnecessary tree branches · avoid local minima / maxima - http://en.wikipedia.org/wiki/Local_minima - sometimes an algorithm cant converge because it gets stuck in a local saddle – try simulated annealing, hill climbing or genetic algorithms to get out of these ruts watch out for rounding errors – http://en.wikipedia.org/wiki/Round-off_error - multiple iterations can in parallel can quickly amplify & blow up your algo ! Use an epsilon, avoid floating point errors, truncations, approximations Happy Coding !

Read the article

Sun Fire X4800 M2 Delivers World Record TPC-C for x86 Systems

- by Brian

Oracle's Sun Fire X4800 M2 server equipped with eight 2.4 GHz Intel Xeon Processor E7-8870 chips obtained a result of 5,055,888 tpmC on the TPC-C benchmark. This result is a world record for x86 servers. Oracle demonstrated this world record database performance running Oracle Database 11g Release 2 Enterprise Edition with Partitioning. The Sun Fire X4800 M2 server delivered a new x86 TPC-C world record of 5,055,888 tpmC with a price performance of $0.89/tpmC using Oracle Database 11g Release 2. This configuration is available 06/26/12. The Sun Fire X4800 M2 server delivers 3.0x times better performance than the next 8-processor result, an IBM System p 570 equipped with POWER6 processors. The Sun Fire X4800 M2 server has 3.1x times better price/performance than the 8-processor 4.7GHz POWER6 IBM System p 570. The Sun Fire X4800 M2 server has 1.6x times better performance than the 4-processor IBM x3850 X5 system equipped with Intel Xeon processors. This is the first TPC-C result on any system using eight Intel Xeon Processor E7-8800 Series chips. The Sun Fire X4800 M2 server is the first x86 system to get over 5 million tpmC. The Oracle solution utilized Oracle Linux operating system and Oracle Database 11g Enterprise Edition Release 2 with Partitioning to produce the x86 world record TPC-C benchmark performance. Performance Landscape Select TPC-C results (sorted by tpmC, bigger is better) System p/c/t tpmC Price/tpmC Avail Database MemorySize Sun Fire X4800 M2 8/80/160 5,055,888 0.89 USD 6/26/2012 Oracle 11g R2 4 TB IBM x3850 X5 4/40/80 3,014,684 0.59 USD 7/11/2011 DB2 ESE 9.7 3 TB IBM x3850 X5 4/32/64 2,308,099 0.60 USD 5/20/2011 DB2 ESE 9.7 1.5 TB IBM System p 570 8/16/32 1,616,162 3.54 USD 11/21/2007 DB2 9.0 2 TB p/c/t - processors, cores, threads Avail - availability date Oracle and IBM TPC-C Response times System tpmC Response Time (sec) New Order 90th% Response Time (sec) New Order Average Sun Fire X4800 M2 5,055,888 0.210 0.166 IBM x3850 X5 3,014,684 0.500 0.272 Ratios - Oracle Better 1.6x 1.4x 1.3x Oracle uses average new order response time for comparison between Oracle and IBM. Graphs of Oracle's and IBM's response times for New-Order can be found in the full disclosure reports on TPC's website TPC-C Official Result Page. Configuration Summary and Results Hardware Configuration: Server Sun Fire X4800 M2 server 8 x 2.4 GHz Intel Xeon Processor E7-8870 4 TB memory 8 x 300 GB 10K RPM SAS internal disks 8 x Dual port 8 Gbs FC HBA Data Storage 10 x Sun Fire X4270 M2 servers configured as COMSTAR heads, each with 1 x 3.06 GHz Intel Xeon X5675 processor 8 GB memory 10 x 2 TB 7.2K RPM 3.5" SAS disks 2 x Sun Storage F5100 Flash Array storage (1.92 TB each) 1 x Brocade 5300 switches Redo Storage 2 x Sun Fire X4270 M2 servers configured as COMSTAR heads, each with 1 x 3.06 GHz Intel Xeon X5675 processor 8 GB memory 11 x 2 TB 7.2K RPM 3.5" SAS disks Clients 8 x Sun Fire X4170 M2 servers, each with 2 x 3.06 GHz Intel Xeon X5675 processors 48 GB memory 2 x 300 GB 10K RPM SAS disks Software Configuration: Oracle Linux (Sun Fire 4800 M2) Oracle Solaris 11 Express (COMSTAR for Sun Fire X4270 M2) Oracle Solaris 10 9/10 (Sun Fire X4170 M2) Oracle Database 11g Release 2 Enterprise Edition with Partitioning Oracle iPlanet Web Server 7.0 U5 Tuxedo CFS-R Tier 1 Results: System: Sun Fire X4800 M2 tpmC: 5,055,888 Price/tpmC: 0.89 USD Available: 6/26/2012 Database: Oracle Database 11g Cluster: no New Order Average Response: 0.166 seconds Benchmark Description TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses. Key Points and Best Practices Oracle Database 11g Release 2 Enterprise Edition with Partitioning scales easily to this high level of performance. COMSTAR (Common Multiprotocol SCSI Target) is the software framework that enables an Oracle Solaris host to serve as a SCSI Target platform. COMSTAR uses a modular approach to break the huge task of handling all the different pieces in a SCSI target subsystem into independent functional modules which are glued together by the SCSI Target Mode Framework (STMF). The modules implementing functionality at SCSI level (disk, tape, medium changer etc.) are not required to know about the underlying transport. And the modules implementing the transport protocol (FC, iSCSI, etc.) are not aware of the SCSI-level functionality of the packets they are transporting. The framework hides the details of allocation providing execution context and cleanup of SCSI commands and associated resources and simplifies the task of writing the SCSI or transport modules. Oracle iPlanet Web Server middleware is used for the client tier of the benchmark. Each web server instance supports more than a quarter-million users while satisfying the response time requirement from the TPC-C benchmark. See Also Oracle Press Release -- Sun Fire X4800 M2 TPC-C Executive Summary tpc.org Complete Sun Fire X4800 M2 TPC-C Full Disclosure Report tpc.org Transaction Processing Performance Council (TPC) Home Page Ideas International Benchmark Page Sun Fire X4800 M2 Server oracle.com OTN Oracle Linux oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN Sun Storage F5100 Flash Array oracle.com OTN Disclosure Statement TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Processing Performance Council (TPC). Sun Fire X4800 M2 (8/80/160) with Oracle Database 11g Release 2 Enterprise Edition with Partitioning, 5,055,888 tpmC, $0.89 USD/tpmC, available 6/26/2012. IBM x3850 X5 (4/40/80) with DB2 ESE 9.7, 3,014,684 tpmC, $0.59 USD/tpmC, available 7/11/2011. IBM x3850 X5 (4/32/64) with DB2 ESE 9.7, 2,308,099 tpmC, $0.60 USD/tpmC, available 5/20/2011. IBM System p 570 (8/16/32) with DB2 9.0, 1,616,162 tpmC, $3.54 USD/tpmC, available 11/21/2007. Source: http://www.tpc.org/tpcc, results as of 7/15/2011.

Read the article

World Record Batch Rate on Oracle JD Edwards Consolidated Workload with SPARC T4-2

- by Brian

Oracle produced a World Record batch throughput for single system results on Oracle's JD Edwards EnterpriseOne Day-in-the-Life benchmark using Oracle's SPARC T4-2 server running Oracle Solaris Containers and consolidating JD Edwards EnterpriseOne, Oracle WebLogic servers and the Oracle Database 11g Release 2. The workload includes both online and batch workload. The SPARC T4-2 server delivered a result of 8,000 online users while concurrently executing a mix of JD Edwards EnterpriseOne Long and Short batch processes at 95.5 UBEs/min (Universal Batch Engines per minute). In order to obtain this record benchmark result, the JD Edwards EnterpriseOne, Oracle WebLogic and Oracle Database 11g Release 2 servers were executed each in separate Oracle Solaris Containers which enabled optimal system resources distribution and performance together with scalable and manageable virtualization. One SPARC T4-2 server running Oracle Solaris Containers and consolidating JD Edwards EnterpriseOne, Oracle WebLogic servers and the Oracle Database 11g Release 2 utilized only 55% of the available CPU power. The Oracle DB server in a Shared Server configuration allows for optimized CPU resource utilization and significant memory savings on the SPARC T4-2 server without sacrificing performance. This configuration with SPARC T4-2 server has achieved 33% more Users/core, 47% more UBEs/min and 78% more Users/rack unit than the IBM Power 770 server. The SPARC T4-2 server with 2 processors ran the JD Edwards "Day-in-the-Life" benchmark and supported 8,000 concurrent online users while concurrently executing mixed batch workloads at 95.5 UBEs per minute. The IBM Power 770 server with twice as many processors supported only 12,000 concurrent online users while concurrently executing mixed batch workloads at only 65 UBEs per minute. This benchmark demonstrates more than 2x cost savings by consolidating the complete solution in a single SPARC T4-2 server compared to earlier published results of 10,000 users and 67 UBEs per minute on two SPARC T4-2 and SPARC T4-1. The Oracle DB server used mirrored (RAID 1) volumes for the database providing high availability for the data without impacting performance. Performance Landscape JD Edwards EnterpriseOne Day in the Life (DIL) Benchmark Consolidated Online with Batch Workload System Rack Units BatchRate(UBEs/m) Online Users Users /Units Users /Core Version SPARC T4-2 (2 x SPARC T4, 2.85 GHz) 3 95.5 8,000 2,667 500 9.0.2 IBM Power 770 (4 x POWER7, 3.3 GHz, 32 cores) 8 65 12,000 1,500 375 9.0.2 Batch Rate (UBEs/m) — Batch transaction rate in UBEs per minute Configuration Summary Hardware Configuration: 1 x SPARC T4-2 server with 2 x SPARC T4 processors, 2.85 GHz 256 GB memory 4 x 300 GB 10K RPM SAS internal disk 2 x 300 GB internal SSD 2 x Sun Storage F5100 Flash Arrays Software Configuration: Oracle Solaris 10 Oracle Solaris Containers JD Edwards EnterpriseOne 9.0.2 JD Edwards EnterpriseOne Tools (8.98.4.2) Oracle WebLogic Server 11g (10.3.4) Oracle HTTP Server 11g Oracle Database 11g Release 2 (11.2.0.1) Benchmark Description JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations. Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company. The workload consists of online transactions and the UBE – Universal Business Engine workload of 61 short and 4 long UBEs. LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time. The UBE processes workload runs from the JD Enterprise Application server. Oracle's UBE processes come as three flavors: Short UBEs < 1 minute engage in Business Report and Summary Analysis, Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address, Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs. The UBE workload generates large numbers of PDF files reports and log files. The UBE Queues are categorized as the QBATCHD, a single threaded queue for large and medium UBEs, and the QPROCESS queue for short UBEs run concurrently. Oracle's UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute. Key Points and Best Practices Two JD Edwards EnterpriseOne Application Servers, two Oracle WebLogic Servers 11g Release 1 coupled with two Oracle Web Tier HTTP server instances and one Oracle Database 11g Release 2 database on a single SPARC T4-2 server were hosted in separate Oracle Solaris Containers bound to four processor sets to demonstrate consolidation of multiple applications, web servers and the database with best resource utilizations. Interrupt fencing was configured on all Oracle Solaris Containers to channel the interrupts to processors other than the processor sets used for the JD Edwards Application server, Oracle WebLogic servers and the database server. A Oracle WebLogic vertical cluster was configured on each WebServer Container with twelve managed instances each to load balance users' requests and to provide the infrastructure that enables scaling to high number of users with ease of deployment and high availability. The database log writer was run in the real time RT class and bound to a processor set. The database redo logs were configured on the raw disk partitions. The Oracle Solaris Container running the Enterprise Application server completed 61 Short UBEs, 4 Long UBEs concurrently as the mixed size batch workload. The mixed size UBEs ran concurrently from the Enterprise Application server with the 8,000 online users driven by the LoadRunner. See Also SPARC T4-2 Server oracle.com OTN JD Edwards EnterpriseOne oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN Oracle Fusion Middleware oracle.com OTN Disclosure Statement Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 09/30/2012.

Read the article

You do not need a separate SQL Server license for a Standby or Passive server - this Microsoft White Paper explains all

- by tonyrogerson

If you were in any doubt at all that you need to license Standby / Passive Failover servers then the White Paper “Do Not Pay Too Much for Your Database Licensing” will settle those doubts. I’ve had debate before people thinking you can only have a single instance as a standby machine, that’s just wrong; it would mean you could have a scenario where you had a 2 node active/passive cluster with database mirroring and log shipping (a total of 4 SQL Server instances) – in that set up you only need to buy one physical license so long as the standby nodes have the same or less physical processors (cores are irrelevant). So next time your supplier suggests you need a license for your standby box tell them you don’t and educate them by pointing them to the white paper. For clarity I’ve copied the extract below from the White Paper. Extract from “Do Not Pay Too Much for Your Database Licensing” Standby Server Customers often implement standby server to make sure the application continues to function in case primary server fails. Standby server continuously receives updates from the primary server and will take over the role of primary server in case of failure in the primary server. Following are comparisons of how each vendor supports standby server licensing. SQL Server Customers does not need to license standby (or passive) server provided that the number of processors in the standby server is equal or less than those in the active server. Oracle DB Oracle requires customer to fully license both active and standby servers even though the standby server is essentially idle most of the time. IBM DB2 IBM licensing on standby server is quite complicated and is different for every editions of DB2. For Enterprise Edition, a minimum of 100 PVUs or 25 Authorized User is needed to license standby server. The following graph compares prices based on a database application with two processors (dual-core) and 25 users with one standby server. [chart snipped] Note All prices are based on newest Intel Xeon Nehalem processor database pricing for purchases within the United States and are in United States dollars. Pricing is based on information available on vendor Web sites for Enterprise Edition. Microsoft SQL Server Enterprise Edition 25 users (CALs) x $164 / CAL + $8,592 / Server = $12,692 (no need to license standby server) Oracle Enterprise Edition (base license without options) Named User Plus minimum (25 Named Users Plus per Core) = 25 x 2 = 50 Named Users Plus x $950 / Named Users Plus x 2 servers = $95,000 IBM DB2 Enterprise Edition (base license without feature pack) Need to purchase 125 Authorized User (400 PVUs/100 PVUs = 4 X 25 = 100 Authorized User + 25 Authorized Users for standby server) = 125 Authorized Users x $1,040 / Authorized Users = $130,000

Read the article

SPARC T4-4 Delivers World Record First Result on PeopleSoft Combined Benchmark

- by Brian

Oracle's SPARC T4-4 servers running Oracle's PeopleSoft HCM 9.1 combined online and batch benchmark achieved World Record 18,000 concurrent users while executing a PeopleSoft Payroll batch job of 500,000 employees in 43.32 minutes and maintaining online users response time at < 2 seconds. This world record is the first to run online and batch workloads concurrently. This result was obtained with a SPARC T4-4 server running Oracle Database 11g Release 2, a SPARC T4-4 server running PeopleSoft HCM 9.1 application server and a SPARC T4-2 server running Oracle WebLogic Server in the web tier. The SPARC T4-4 server running the application tier used Oracle Solaris Zones which provide a flexible, scalable and manageable virtualization environment. The average CPU utilization on the SPARC T4-2 server in the web tier was 17%, on the SPARC T4-4 server in the application tier it was 59%, and on the SPARC T4-4 server in the database tier was 35% (online and batch) leaving significant headroom for additional processing across the three tiers. The SPARC T4-4 server used for the database tier hosted Oracle Database 11g Release 2 using Oracle Automatic Storage Management (ASM) for database files management with I/O performance equivalent to raw devices. This is the first three tier mixed workload (online and batch) PeopleSoft benchmark also processing PeopleSoft payroll batch workload. Performance Landscape PeopleSoft HR Self-Service and Payroll Benchmark Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-2 (web) SPARC T4-4 (app) SPARC T4-2 (db) 18,000 0.944 0.503 43.32 64 Configuration Summary Application Configuration: 1 x SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 512 GB memory 5 x 300 GB SAS internal disks 1 x 100 GB and 2 x 300 GB internal SSDs 2 x 10 Gbe HBA Oracle Solaris 11 11/11 PeopleTools 8.52 PeopleSoft HCM 9.1 Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031 Java Platform, Standard Edition Development Kit 6 Update 32 Database Configuration: 1 x SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 256 GB memory 3 x 300 GB SAS internal disks Oracle Solaris 11 11/11 Oracle Database 11g Release 2 Web Tier Configuration: 1 x SPARC T4-2 server with 2 x SPARC T4 processors, 2.85 GHz 256 GB memory 2 x 300 GB SAS internal disks 1 x 100 GB internal SSD Oracle Solaris 11 11/11 PeopleTools 8.52 Oracle WebLogic Server 10.3.4 Java Platform, Standard Edition Development Kit 6 Update 32 Storage Configuration: 1 x Sun Server X2-4 as a COMSTAR head for data 4 x Intel Xeon X7550, 2.0 GHz 128 GB memory 1 x Sun Storage F5100 Flash Array (80 flash modules) 1 x Sun Storage F5100 Flash Array (40 flash modules) 1 x Sun Fire X4275 as a COMSTAR head for redo logs 12 x 2 TB SAS disks with Niwot Raid controller Benchmark Description This benchmark combines PeopleSoft HCM 9.1 HR Self Service online and PeopleSoft Payroll batch workloads to run on a unified database deployed on Oracle Database 11g Release 2. The PeopleSoft HRSS benchmark kit is a Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark where DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published. PeopleSoft HR SS defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consist of 14 scenarios which emulate users performing typical HCM transactions such as viewing paycheck, promoting and hiring employees, updating employee profile and other typical HCM application transactions. All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. This benchmark metric is the weighted average response search/save time for all the transactions. The PeopleSoft 9.1 Payroll (North America) benchmark demonstrates system performance for a range of processing volumes in a specific configuration. This workload represents large batch runs typical of a ERP environment during a mass update. The benchmark measures five application business process run times for a database representing large organization. They are Paysheet Creation, Payroll Calculation, Payroll Confirmation, Print Advice forms, and Create Direct Deposit File. The benchmark metric is the cumulative elapsed time taken to complete the Paysheet Creation, Payroll Calculation and Payroll Confirmation business application processes. The benchmark metrics are taken for each respective benchmark while running simultaneously on the same database back-end. Specifically, the payroll batch processes are started when the online workload reaches steady state (the maximum number of online users) and overlap with online transactions for the duration of the steady state. Key Points and Best Practices Two Oracle PeopleSoft Domain sets with 200 application servers each on a SPARC T4-4 server were hosted in 2 separate Oracle Solaris Zones to demonstrate consolidation of multiple application servers, ease of administration and performance tuning. Each Oracle Solaris Zone was bound to a separate processor set, each containing 15 cores (total 120 threads). The default set (1 core from first and third processor socket, total 16 threads) was used for network and disk interrupt handling. This was done to improve performance by reducing memory access latency by using the physical memory closest to the processors and offload I/O interrupt handling to default set threads, freeing up cpu resources for Application Servers threads and balancing application workload across 240 threads. See Also Oracle PeopleSoft Benchmark White Papers oracle.com SPARC T4-2 Server oracle.com OTN SPARC T4-4 Server oracle.com OTN PeopleSoft Enterprise Human Capital Management oracle.com OTN PeopleSoft Enterprise Human Capital Management (Payroll) oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN Disclosure Statement Oracle's PeopleSoft HR and Payroll combined benchmark, www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 09/30/2012.

Read the article

Two New AdvancedTCA Blades from Emerson Network Power

ATCA server blades feature six-core processors from the Intel Xeon processor 5600 series for optimized virtualization and power management Xeon - Hardware - Intel Corporation - Components - Intel Xeon

Read the article

Improve your Application Performance with .NET Framework 4.0

Nice Article on CodeGuru. This processors we use today are quite different from those of just a few years ago, as most processors today provide multiple cores and/or multiple threads. With multiple cores and/or threads we need to change how we tackle problems in code. Yes we can still continue to write code to perform an action in a top down fashion to complete a task. This apprach will continue to work; however, you are not taking advantage of the extra processing power available. The best way to take advantage of the extra cores prior to .NET Framework 4.0 was to create threads and/or utilize the ThreadPool. For many developers utilizing Threads or the ThreadPool can be a little daunting. The .NET 4.0 Framework drastically simplified the process of utilizing the extra processing power through the Task Parallel Library (TPL). This article talks following topics “Data Parallelism”, “Parallel LINQ (PLINQ)” and “Task Parallelism”. span.fullpost {display:none;}

Read the article

IBM Expands Power 7 Line to Entry-Level and Mid-Range Blade Servers

New processors allow for expandability and improved power efficiency compared to the previous generation.

Read the article

Oracle's Thirteen Engineered Systems

- by Luis Moreno Campos

You already need a catalogue to keep up with the many new stuff coming out from Oracle Engineered from factory.In the Exadata portfolio you have 4 systems:- Quarter Rack X2-2 Database Machine- Half-Rack X2-2 Database Machine- Full-Rack X2-2 Database Machine- X2-8 Database MachineBut if Exadata presents a stunning portfolio, Exalogic doesn't fall behind on that by putting out 6 versions: 3 sizes (Quarter, Half and Full) with x86 processors and the same 3 sizes with SPARC based processors.Finally we have 3 new systems called SPARC Superclusters where Solaris 11 was re-engineered to take more out of the power of Infiniband: "Available in the next calendar year, the Oracle SPARC Supercluster will be available in T3-2, T3-4 and M5000-based configurations".I see Oracle delivering on it's promise to tightly integrate Hardware and Software to work closer together.

Read the article

Why Move My Oracle Database to New SPARC Hardware?

- by rickramsey

If didn't manage to catch all the news during the proverbial Firehose Down the Throat that is Oracle OpenWorld, you'll enjoy these short recaps from Brad Carlile. He makes things clear in just a couple of minutes. photograph copyright by Edge of Day Photography, with permission Video: Latest Improvements to Oracle SPARC Processors with Brad Carlile T5, M5, and M6. Three wicked fast processors that Oracle announced over the last year. Brad Carlile explains how much faster they are, and why they are better than previous versions. Video: Why Move Your Oracle Database to SPARC Servers with Brad Carlile If I'm happy with how my Oracle Database 11g is performing, why should I deploy it on the new Oracle SPARC hardware? For the same reasons that you would want to buy a sports car that goes twice as fast AND gets better gas mileage, Brad Carlile explains. Well, if there are such dramatic performance improvements and cost savings, then why should I move up to Oracle Database 12c? -Rick Follow me on: Blog | Facebook | Twitter | Personal Twitter | YouTube | The Great Peruvian Novel

Read the article

Improve your Application Performance with .NET Framework 4.0

Nice Article on CodeGuru. This processors we use today are quite different from those of just a few years ago, as most processors today provide multiple cores and/or multiple threads. With multiple cores and/or threads we need to change how we tackle problems in code. Yes we can still continue to write code to perform an action in a top down fashion to complete a task. This apprach will continue to work; however, you are not taking advantage of the extra processing power available. The best way to take advantage of the extra cores prior to .NET Framework 4.0 was to create threads and/or utilize the ThreadPool. For many developers utilizing Threads or the ThreadPool can be a little daunting. The .NET 4.0 Framework drastically simplified the process of utilizing the extra processing power through the Task Parallel Library (TPL). This article talks following topics “Data Parallelism”, “Parallel LINQ (PLINQ)” and “Task Parallelism”. span.fullpost {display:none;}

Read the article

Polite busy-waiting with WRPAUSE on SPARC

- by Dave

Unbounded busy-waiting is an poor idea for user-space code, so we typically use spin-then-block strategies when, say, waiting for a lock to be released or some other event. If we're going to spin, even briefly, then we'd prefer to do so in a manner that minimizes performance degradation for other sibling logical processors ("strands") that share compute resources. We want to spin politely and refrain from impeding the progress and performance of other threads — ostensibly doing useful work and making progress — that run on the same core. On a SPARC T4, for instance, 8 strands will share a core, and that core has its own L1 cache and 2 pipelines. On x86 we have the PAUSE instruction, which, naively, can be thought of as a hardware "yield" operator which temporarily surrenders compute resources to threads on sibling strands. Of course this helps avoid intra-core performance interference. On the SPARC T2 our preferred busy-waiting idiom was "RD %CCR,%G0" which is a high-latency no-nop. The T4 provides a dedicated and extremely useful WRPAUSE instruction. The processor architecture manuals are the authoritative source, but briefly, WRPAUSE writes a cycle count into the the PAUSE register, which is ASR27. Barring interrupts, the processor then delays for the requested period. There's no need for the operating system to save the PAUSE register over context switches as it always resets to 0 on traps. Digressing briefly, if you use unbounded spinning then ultimately the kernel will preempt and deschedule your thread if there are other ready threads than are starving. But by using a spin-then-block strategy we can allow other ready threads to run without resorting to involuntary time-slicing, which operates on a long-ish time scale. Generally, that makes your application more responsive. In addition, by blocking voluntarily we give the operating system far more latitude regarding power management. Finally, I should note that while we have OS-level facilities like sched_yield() at our disposal, yielding almost never does what you'd want or naively expect. Returning to WRPAUSE, it's natural to ask how well it works. To help answer that question I wrote a very simple C/pthreads benchmark that launches 8 concurrent threads and binds those threads to processors 0..7. The processors are numbered geographically on the T4, so those threads will all be running on just one core. Unlike the SPARC T2, where logical CPUs 0,1,2 and 3 were assigned to the first pipeline, and CPUs 4,5,6 and 7 were assigned to the 2nd, there's no fixed mapping between CPUs and pipelines in the T4. And in some circumstances when the other 7 logical processors are idling quietly, it's possible for the remaining logical processor to leverage both pipelines. Some number T of the threads will iterate in a tight loop advancing a simple Marsaglia xor-shift pseudo-random number generator. T is a command-line argument. The main thread loops, reporting the aggregate number of PRNG steps performed collectively by those T threads in the last 10 second measurement interval. The other threads (there are 8-T of these) run in a loop busy-waiting concurrently with the T threads. We vary T between 1 and 8 threads, and report on various busy-waiting idioms. The values in the table are the aggregate number of PRNG steps completed by the set of T threads. The unit is millions of iterations per 10 seconds. For the "PRNG step" busy-waiting mode, the busy-waiting threads execute exactly the same code as the T worker threads. We can easily compute the average rate of progress for individual worker threads by dividing the aggregate score by the number of worker threads T. I should note that the PRNG steps are extremely cycle-heavy and access almost no memory, so arguably this microbenchmark is not as representative of "normal" code as it could be. And for the purposes of comparison I included a row in the table that reflects a waiting policy where the waiting threads call poll(NULL,0,1000) and block in the kernel. Obviously this isn't busy-waiting, but the data is interesting for reference. _table { border:2px black dotted; margin: auto; width: auto; } _tr { border: 2px red dashed; } _td { border: 1px green solid; } _table { border:2px black dotted; margin: auto; width: auto; } _tr { border: 2px red dashed; } td { background-color : #E0E0E0 ; text-align : right ; } th { text-align : left ; } td { background-color : #E0E0E0 ; text-align : right ; } th { text-align : left ; } Aggregate progress T = #worker threads Wait Mechanism for 8-T threadsT=1T=2T=3T=4T=5T=6T=7T=8 Park thread in poll() 32653347334833483348334833483348 no-op 415 831 124316482060249729303349 RD %ccr,%g0 "pause" 14262429269228623013316232553349 PRNG step 412 829 124616702092251029303348 WRPause(8000) 32443361333133483349334833483348 WRPause(4000) 32153308331533223347334833473348 WRPause(1000) 30853199322432513310334833483348 WRPause(500) 29173070315032223270330933483348 WRPause(250) 26942864294930773205338833483348 WRPause(100) 21552469262227902911321433303348

Read the article

Think Parallel 2010, Five Years of Multicore

Der erste x86-Multicore-Prozessor wurde vor ca. 5 Jahren eingef¼hrt Parallel computing - Programming - Hardware - Processors - Components

Read the article

How to Optimize Oracle Solaris Applications for Intel Platforms

This technical white paper explains how developers can get the best performance out of applications running on Solaris on Intel Xeon processors.

Read the article

How to Optimize Oracle Solaris Applications for Intel Platforms

This technical white paper explains how developers can get the best performance out of applications running on Solaris on Intel Xeon processors.

Read the article

Bring 2 GB Large Pages to Solaris 10

- by Giri Mandalika

Few facts: 8 KB is the default page size on Oracle Solaris 10 and 11 as of this writing Both hardware and software must have support for 2 GB large pages SPARC T4 processors are capable of supporting 2 GB pages Oracle Solaris 11 kernel has in-built support for 2 GB pages Oracle Solaris 10 has no default support for 2 GB pages Memory intensive 64-bit applications may benefit the most from using 2 GB pages Prerequisites: OS: Oracle Solaris 10 8/11 (Update 10) or later Hardware: Oracle servers with SPARC T4 processors e.g., SPARC T4-1, T4-2 or T4-4, SPARC SuperCluster T4-4 Steps to enable 2 GB large pages on Oracle Solaris 10: Install the latest kernel patch or ensure that 147440-04 or later was installed Check the patch download instructions Add the following line to /etc/system and reboot set max_uheap_lpsize=0x80000000 Finally check the output of the following command when the system is back online pagesize -a eg., % pagesize -a 8192 <-- 8K 65536 <-- 64K 4194304 <-- 4M 268435456 <-- 256M 2147483648 <-- 2G % uname -a SunOS jar-jar 5.10 Generic_147440-21 sun4v sparc sun4v Also See: Solaris 9 or later: More performance with Large Pages (MPSS) Large page support for instructions (text) in Solaris 10 1/06 Solaris: How To Disable Out Of The Box (OOB) Large Page Support? Memory fragmentation / Large Pages on Solaris x86

Read the article

Avoiding and Identifying False Sharing Among Threads

When threads on different processors modify variables that reside on the same cache line CPU cache - Thread - Programming - Central processing unit - Shareware

Read the article

Transferring an SQL Processor License to a virtual hosted environment

- by Andrew Shepherd

My company is currently hosting a service in-house, and we want to move to an externally hosted environment. We would then be using a virtual server. I understand that this might be spread across multiple machines, but from my perspective as a customer, this layer is abstracted away - I shouldn't know or care about the hardware that the OS is hosted on. We have a licensed edition of SQL Server 2008. This is one Processor license. Will it be a violation of the licensing agreement to use this in a virtual environment. From the reference guide here it says When licensed Per Processor With Workgroup, Web, and Standard editions, for each server to which you have assigned the required number of per processor licenses, you may run, at any one time, any number of instances of the server software in physical and virtual operating system environments on the licensed server. However, the total number of physical and virtual processors used by those operating system environments cannot exceed the number of software licenses assigned to that server For enterprise edition there is an added option: if all physical processors in a machine have been licensed, then you may run unlimited instances of SQL server 2008 in one physical and an unlimited number of virtual operating environments on that same machine. I'm having trouble getting my head around this. Would I theoretically have to get a license for every processor in this virtual environment (which is effectively impossible because I have no way of knowing how many processors there actually are)? Or can I just say that it's hosted on one "virtual" server, so that's OK?

Read the article

how do addressing modes work on a physical level?

- by altvali

I'm trying to learn this basic thing about processors that should be taught in every CS department of every university. Yet i can't find it on the net (Google doesn't help) and i can't find it in my class materials either. Do you know any good resource on how addressing modes work on a physical level? I'm particularly interested in Intel processors.

Read the article

Improved Performance on PeopleSoft Combined Benchmark using SPARC T4-4

- by Brian

Oracle's SPARC T4-4 server running Oracle's PeopleSoft HCM 9.1 combined online and batch benchmark achieved a world record 18,000 concurrent users experiencing subsecond response time while executing a PeopleSoft Payroll batch job of 500,000 employees in 32.4 minutes. This result was obtained with a SPARC T4-4 server running Oracle Database 11g Release 2, a SPARC T4-4 server running PeopleSoft HCM 9.1 application server and a SPARC T4-2 server running Oracle WebLogic Server in the web tier. The SPARC T4-4 server running the application tier used Oracle Solaris Zones which provide a flexible, scalable and manageable virtualization environment. The average CPU utilization on the SPARC T4-2 server in the web tier was 17%, on the SPARC T4-4 server in the application tier it was 59%, and on the SPARC T4-4 server in the database tier was 47% (online and batch) leaving significant headroom for additional processing across the three tiers. The SPARC T4-4 server used for the database tier hosted Oracle Database 11g Release 2 using Oracle Automatic Storage Management (ASM) for database files management with I/O performance equivalent to raw devices. Performance Landscape Results are presented for the PeopleSoft HRMS Self-Service and Payroll combined benchmark. The new result with 128 streams shows significant improvement in the payroll batch processing time with little impact on the self-service component response time. PeopleSoft HRMS Self-Service and Payroll Benchmark Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-2 (web) SPARC T4-4 (app) SPARC T4-4 (db) 18,000 0.988 0.539 32.4 128 SPARC T4-2 (web) SPARC T4-4 (app) SPARC T4-4 (db) 18,000 0.944 0.503 43.3 64 The following results are for the PeopleSoft HRMS Self-Service benchmark that was previous run. The results are not directly comparable with the combined results because they do not include the payroll component. PeopleSoft HRMS Self-Service 9.1 Benchmark Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-2 (web) SPARC T4-4 (app) 2x SPARC T4-2 (db) 18,000 1.048 0.742 N/A N/A The following results are for the PeopleSoft Payroll benchmark that was previous run. The results are not directly comparable with the combined results because they do not include the self-service component. PeopleSoft Payroll (N.A.) 9.1 - 500K Employees (7 Million SQL PayCalc, Unicode) Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-4 (db) N/A N/A N/A 30.84 96 Configuration Summary Application Configuration: 1 x SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 512 GB memory Oracle Solaris 11 11/11 PeopleTools 8.52 PeopleSoft HCM 9.1 Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031 Java Platform, Standard Edition Development Kit 6 Update 32 Database Configuration: 1 x SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 256 GB memory Oracle Solaris 11 11/11 Oracle Database 11g Release 2 PeopleTools 8.52 Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031 Micro Focus Server Express (COBOL v 5.1.00) Web Tier Configuration: 1 x SPARC T4-2 server with 2 x SPARC T4 processors, 2.85 GHz 256 GB memory Oracle Solaris 11 11/11 PeopleTools 8.52 Oracle WebLogic Server 10.3.4 Java Platform, Standard Edition Development Kit 6 Update 32 Storage Configuration: 1 x Sun Server X2-4 as a COMSTAR head for data 4 x Intel Xeon X7550, 2.0 GHz 128 GB memory 1 x Sun Storage F5100 Flash Array (80 flash modules) 1 x Sun Storage F5100 Flash Array (40 flash modules) 1 x Sun Fire X4275 as a COMSTAR head for redo logs 12 x 2 TB SAS disks with Niwot Raid controller Benchmark Description This benchmark combines PeopleSoft HCM 9.1 HR Self Service online and PeopleSoft Payroll batch workloads to run on a unified database deployed on Oracle Database 11g Release 2. The PeopleSoft HRSS benchmark kit is a Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark where DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published. PeopleSoft HR SS defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consist of 14 scenarios which emulate users performing typical HCM transactions such as viewing paycheck, promoting and hiring employees, updating employee profile and other typical HCM application transactions. All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. This benchmark metric is the weighted average response search/save time for all the transactions. The PeopleSoft 9.1 Payroll (North America) benchmark demonstrates system performance for a range of processing volumes in a specific configuration. This workload represents large batch runs typical of a ERP environment during a mass update. The benchmark measures five application business process run times for a database representing large organization. They are Paysheet Creation, Payroll Calculation, Payroll Confirmation, Print Advice forms, and Create Direct Deposit File. The benchmark metric is the cumulative elapsed time taken to complete the Paysheet Creation, Payroll Calculation and Payroll Confirmation business application processes. The benchmark metrics are taken for each respective benchmark while running simultaneously on the same database back-end. Specifically, the payroll batch processes are started when the online workload reaches steady state (the maximum number of online users) and overlap with online transactions for the duration of the steady state. Key Points and Best Practices Two PeopleSoft Domain sets with 200 application servers each on a SPARC T4-4 server were hosted in 2 separate Oracle Solaris Zones to demonstrate consolidation of multiple application servers, ease of administration and performance tuning. Each Oracle Solaris Zone was bound to a separate processor set, each containing 15 cores (total 120 threads). The default set (1 core from first and third processor socket, total 16 threads) was used for network and disk interrupt handling. This was done to improve performance by reducing memory access latency by using the physical memory closest to the processors and offload I/O interrupt handling to default set threads, freeing up cpu resources for Application Servers threads and balancing application workload across 240 threads. A total of 128 PeopleSoft streams server processes where used on the database node to complete payroll batch job of 500,000 employees in 32.4 minutes. See Also Oracle PeopleSoft Benchmark White Papers oracle.com SPARC T4-2 Server oracle.com OTN SPARC T4-4 Server oracle.com OTN PeopleSoft Enterprise Human Capital Managementoracle.com OTN PeopleSoft Enterprise Human Capital Management (Payroll) oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 oracle.com OTN Disclosure Statement Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 8 November 2012.

Read the article

Das T5-4 TPC-H Ergebnis naeher betrachtet

- by Stefan Hinker

Inzwischen haben vermutlich viele das neue TPC-H Ergebnis der SPARC T5-4 gesehen, das am 7. Juni bei der TPC eingereicht wurde. Die wesentlichen Punkte dieses Benchmarks wurden wie gewohnt bereits von unserer Benchmark-Truppe auf "BestPerf" zusammengefasst. Es gibt aber noch einiges mehr, das eine naehere Betrachtung lohnt. Skalierbarkeit Das TPC raet von einem Vergleich von TPC-H Ergebnissen in unterschiedlichen Groessenklassen ab. Aber auch innerhalb der 3000GB-Klasse ist es interessant: SPARC T4-4 mit 4 CPUs (32 Cores mit 3.0 GHz) liefert 205,792 QphH. SPARC T5-4 mit 4 CPUs (64 Cores mit 3.6 GHz) liefert 409,721 QphH. Das heisst, es fehlen lediglich 1863 QphH oder 0.45% zu 100% Skalierbarkeit, wenn man davon ausgeht, dass die doppelte Anzahl Kerne das doppelte Ergebnis liefern sollte. Etwas anspruchsvoller, koennte man natuerlich auch einen Faktor von 2.4 erwarten, wenn man die hoehere Taktrate mit beruecksichtigt. Das wuerde die Latte auf 493901 QphH legen. Dann waere die SPARC T5-4 bei 83%. Damit stellt sich die Frage: Was hat hier nicht skaliert? Vermutlich der Plattenspeicher! Auch hier lohnt sich eine naehere Betrachtung: Plattenspeicher Im Bericht auf BestPerf und auch im Full Disclosure Report der TPC stehen einige interessante Details zum Plattenspeicher und der Konfiguration. In der Konfiguration der SPARC T4-4 wurden 12 2540-M2 Arrays verwendet, die jeweils ca. 1.5 GB/s Durchsatz liefert, insgesamt also eta 18 GB/s. Dabei waren die Arrays offensichtlich mit jeweils 2 Kabeln pro Array direkt an die 24 8GBit FC-Ports des Servers angeschlossen. Mit den 2x 8GBit Ports pro Array koennte man so ein theoretisches Maximum von 2GB/s erreichen. Tatsaechlich wurden 1.5GB/s geliefert, was so ziemlich dem realistischen Maximum entsprechen duerfte. Fuer den Lauf mit der SPARC T5-4 wurden doppelt so viele Platten verwendet. Dafuer wurden die 2540-M2 Arrays mit je einem zusaetzlichen Plattentray erweitert. Mit dieser Konfiguration wurde dann (laut BestPerf) ein Maximaldurchsatz von 33 GB/s erreicht - nicht ganz das doppelte des SPARC T4-4 Laufs. Um tatsaechlich den doppelten Durchsatz (36 GB/s) zu liefern, haette jedes der 12 Arrays 3 GB/s ueber seine 4 8GBit Ports liefern muessen. Im FDR stehen nur 12 dual-port FC HBAs, was die Verwendung der Brocade FC Switches erklaert: Es wurden alle 4 8GBit ports jedes Arrays an die Switches angeschlossen, die die Datenstroeme dann in die 24 16GBit HBA ports des Servers buendelten. Das theoretische Maximum jedes Storage-Arrays waere nun 4 GB/s. Wenn man jedoch den Protokoll- und "Realitaets"-Overhead mit einrechnet, sind die tatsaechlich gelieferten 2.75 GB/s gar nicht schlecht. Mit diesen Zahlen im Hinterkopf ist die Verdopplung des SPARC T4-4 Ergebnisses eine gute Leistung - und gleichzeitig eine gute Erklaerung, warum nicht bis zum 2.4-fachen skaliert wurde. Nebenbei bemerkt: Weder die SPARC T4-4 noch die SPARC T5-4 hatten in der gemessenen Konfiguration irgendwelche Flash-Devices. Mitbewerb Seit die T4 Systeme auf dem Markt sind, bemuehen sich unsere Mitbewerber redlich darum, ueberall den Eindruck zu hinterlassen, die Leistung des SPARC CPU-Kerns waere weiterhin mangelhaft. Auch scheinen sie ueberzeugt zu sein, dass (ueber)grosse Caches und hohe Taktraten die einzigen Schluessel zu echter Server Performance seien. Wenn ich mir nun jedoch die oeffentlichen TPC-H Ergebnisse ansehe, sehe ich dies: TPC-H @3000GB, Non-Clustered Systems System QphH SPARC T5-4 3.6 GHz SPARC T5 4/64 – 2048 GB 409,721.8 SPARC T4-4 3.0 GHz SPARC T4 4/32 – 1024 GB 205,792.0 IBM Power 780 4.1 GHz POWER7 8/32 – 1024 GB 192,001.1 HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 8/64 – 512 GB 162,601.7 Kurz zusammengefasst: Mit 32 Kernen (mit 3 GHz und 4MB L3 Cache), liefert die SPARC T4-4 mehr QphH@3000GB ab als IBM mit ihrer 32 Kern Power7 (bei 4.1 GHz und 32MB L3 Cache) und auch mehr als HP mit einem 64 Kern Intel Xeon System (2.27 GHz und 24MB L3 Cache). Ich frage mich, wo genau SPARC hier mangelhaft ist? Nun koennte man natuerlich argumentieren, dass beide Ergebnisse nicht gerade neu sind. Nun, in Ermangelung neuerer Ergebnisse kann man ja mal ein wenig spekulieren: IBMs aktueller Performance Report listet die o.g. IBM Power 780 mit einem rPerf Wert von 425.5. Ein passendes Nachfolgesystem mit Power7+ CPUs waere die Power 780+ mit 64 Kernen, verfuegbar mit 3.72 GHz. Sie wird mit einem rPerf Wert von 690.1 angegeben, also 1.62x mehr. Wenn man also annimmt, dass Plattenspeicher nicht der limitierende Faktor ist (IBM hat mit 177 SSDs getestet, sie duerfen das gerne auf 400 erhoehen) und IBMs eigene Leistungsabschaetzung zugrunde legt, darf man ein theoretisches Ergebnis von 311398 QphH@3000GB erwarten. Das waere dann allerdings immer noch weit von dem Ergebnis der SPARC T5-4 entfernt, und gerade in der von IBM so geschaetzen "per core" Metric noch weniger vorteilhaft. In der x86-Welt sieht es nicht besser aus. Leider gibt es von Intel keine so praktischen rPerf-Tabellen. Daher muss ich hier fuer eine Schaetzung auf SPECint_rate2006 zurueckgreifen. (Ich bin kein grosser Fan von solchen Kreuz- und Querschaetzungen. Insb. SPECcpu ist nicht besonders geeignet, um Datenbank-Leistung abzuschaetzen, da fast kein IO im Spiel ist.) Das o.g. HP System wird bei SPEC mit 1580 CINT2006_rate gelistet. Das bis einschl. 2013-06-14 beste Resultat fuer den neuen Intel Xeon E7-4870 mit 8 CPUs ist 2180 CINT2006_rate. Das ist immerhin 1.38x besser. (Wenn man nur die Taktrate beruecksichtigen wuerde, waere man bei 1.32x.) Hier weiter zu rechnen, ist muessig, aber fuer die ungeduldigen Leser hier eine kleine tabellarische Zusammenfassung: TPC-H @3000GB Performance Spekulationen System QphH* Verbesserung gegenueber der frueheren Generation SPARC T4-4 32 cores SPARC T4 205,792 2x SPARC T5-464 cores SPARC T5 409,721 IBM Power 780 32 cores Power7 192,001 1.62x IBM Power 780+ 64 cores Power7+ 311,398* HP ProLiant DL980 G764 cores Intel Xeon X7560 162,601 1.38x HP ProLiant DL980 G780 cores Intel Xeon E7-4870 224,348* * Keine echten Resultate - spekulative Werte auf der Grundlage von rPerf (Power7+) oder SPECint_rate2006 (HP) Natuerlich sind IBM oder HP herzlich eingeladen, diese Werte zu widerlegen. Aber stand heute warte ich noch auf aktuelle Benchmark Veroffentlichungen in diesem Datensegment. Was koennen wir also zusammenfassen? Es gibt einige Hinweise, dass der Plattenspeicher der begrenzende Faktor war, der die SPARC T5-4 daran hinderte, auf jenseits von 2x zu skalieren Der Mythos, dass SPARC Kerne keine Leistung bringen, ist genau das - ein Mythos. Wie sieht es umgekehrt eigentlich mit einem TPC-H Ergebnis fuer die Power7+ aus? Cache ist nicht der magische Performance-Schalter, fuer den ihn manche Leute offenbar halten. Ein System, eine CPU-Architektur und ein Betriebsystem jenseits einer gewissen Grenze zu skalieren ist schwer. In der x86-Welt scheint es noch ein wenig schwerer zu sein. Was fehlt? Nun, das Thema Preis/Leistung ueberlasse ich gerne den Verkaeufern ;-) Und zu guter Letzt: Nein, ich habe mich nicht ins Marketing versetzen lassen. Aber manchmal kann ich mich einfach nicht zurueckhalten... Disclosure Statements The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org, results as of 6/7/13. Prices are in USD. SPARC T5-4 409,721.8 QphH@3000GB, $3.94/QphH@3000GB, available 9/24/13, 4 processors, 64 cores, 512 threads; SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads. SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of June 18, 2013 from www.spec.org. HP ProLiant DL980 G7 (2.27 GHz, Intel Xeon X7560): 1580 SPECint_rate2006; HP ProLiant DL980 G7 (2.4 GHz, Intel Xeon E7-4870): 2180 SPECint_rate2006,

Read the article

Oracle TimesTen In-Memory Database Performance on SPARC T4-2

- by Brian

The Oracle TimesTen In-Memory Database is optimized to run on Oracle's SPARC T4 processor platforms running Oracle Solaris 11 providing unsurpassed scalability, performance, upgradability, protection of investment and return on investment. The following demonstrate the value of combining Oracle TimesTen In-Memory Database with SPARC T4 servers and Oracle Solaris 11: On a Mobile Call Processing test, the 2-socket SPARC T4-2 server outperforms: Oracle's SPARC Enterprise M4000 server (4 x 2.66 GHz SPARC64 VII+) by 34%. Oracle's SPARC T3-4 (4 x 1.65 GHz SPARC T3) by 2.7x, or 5.4x per processor. Utilizing the TimesTen Performance Throughput Benchmark (TPTBM), the SPARC T4-2 server protects investments with: 2.1x the overall performance of a 4-socket SPARC Enterprise M4000 server in read-only mode and 1.5x the performance in update-only testing. This is 4.2x more performance per processor than the SPARC64 VII+ 2.66 GHz based system. 10x more performance per processor than the SPARC T2+ 1.4 GHz server. 1.6x better performance per processor than the SPARC T3 1.65 GHz based server. In replication testing, the two socket SPARC T4-2 server is over 3x faster than the performance of a four socket SPARC Enterprise T5440 server in both asynchronous replication environment and the highly available 2-Safe replication. This testing emphasizes parallel replication between systems. Performance Landscape Mobile Call Processing Test Performance System Processor Sockets/Cores/Threads Tps SPARC T4-2 SPARC T4, 2.85 GHz 2 16 128 218,400 M4000 SPARC64 VII+, 2.66 GHz 4 16 32 162,900 SPARC T3-4 SPARC T3, 1.65 GHz 4 64 512 80,400 TimesTen Performance Throughput Benchmark (TPTBM) Read-Only System Processor Sockets/Cores/Threads Tps SPARC T3-4 SPARC T3, 1.65 GHz 4 64 512 7.9M SPARC T4-2 SPARC T4, 2.85 GHz 2 16 128 6.5M M4000 SPARC64 VII+, 2.66 GHz 4 16 32 3.1M T5440 SPARC T2+, 1.4 GHz 4 32 256 3.1M TimesTen Performance Throughput Benchmark (TPTBM) Update-Only System Processor Sockets/Cores/Threads Tps SPARC T4-2 SPARC T4, 2.85 GHz 2 16 128 547,800 M4000 SPARC64 VII+, 2.66 GHz 4 16 32 363,800 SPARC T3-4 SPARC T3, 1.65 GHz 4 64 512 240,500 TimesTen Replication Tests System Processor Sockets/Cores/Threads Asynchronous 2-Safe SPARC T4-2 SPARC T4, 2.85 GHz 2 16 128 38,024 13,701 SPARC T5440 SPARC T2+, 1.4 GHz 4 32 256 11,621 4,615 Configuration Summary Hardware Configurations: SPARC T4-2 server 2 x SPARC T4 processors, 2.85 GHz 256 GB memory 1 x 8 Gbs FC Qlogic HBA 1 x 6 Gbs SAS HBA 4 x 300 GB internal disks Sun Storage F5100 Flash Array (40 x 24 GB flash modules) 1 x Sun Fire X4275 server configured as COMSTAR head SPARC T3-4 server 4 x SPARC T3 processors, 1.6 GHz 512 GB memory 1 x 8 Gbs FC Qlogic HBA 8 x 146 GB internal disks 1 x Sun Fire X4275 server configured as COMSTAR head SPARC Enterprise M4000 server 4 x SPARC64 VII+ processors, 2.66 GHz 128 GB memory 1 x 8 Gbs FC Qlogic HBA 1 x 6 Gbs SAS HBA 2 x 146 GB internal disks Sun Storage F5100 Flash Array (40 x 24 GB flash modules) 1 x Sun Fire X4275 server configured as COMSTAR head Software Configuration: Oracle Solaris 11 11/11 Oracle TimesTen 11.2.2.4 Benchmark Descriptions TimesTen Performance Throughput BenchMark (TPTBM) is shipped with TimesTen and measures the total throughput of the system. The workload can test read-only, update-only, delete and insert operations as required. Mobile Call Processing is a customer-based workload for processing calls made by mobile phone subscribers. The workload has a mixture of read-only, update, and insert-only transactions. The peak throughput performance is measured from multiple concurrent processes executing the transactions until a peak performance is reached via saturation of the available resources. Parallel Replication tests using both asynchronous and 2-Safe replication methods. For asynchronous replication, transactions are processed in batches to maximize the throughput capabilities of the replication server and network. In 2-Safe replication, also known as no data-loss or high availability, transactions are replicated between servers immediately emphasizing low latency. For both environments, performance is measured in the number of parallel replication servers and the maximum transactions-per-second for all concurrent processes. See Also SPARC T4-2 Server oracle.com OTN Oracle TimesTen In-Memory Database oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN Disclosure Statement Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 1 October 2012.

Read the article

How the SPARC T4 Processor Optimizes Throughput Capacity: A Case Study

- by Ruud

This white paper demonstrates the architected latency hiding features of Oracle’s UltraSPARC T2+ and SPARC T4 processors That is the first sentence from this technical white paper, but what does it exactly mean? Let's consider a very simple example, the computation of a = b + c. This boils down to the following (pseudo-assembler) instructions that need to be executed: load @b, r1 load @c, r2 add r1,r2,r3 store r3, @a The first two instructions load variables b and c from an address in memory (here symbolized by @b and @c respectively). These values go into registers r1 and r2. The third instruction adds the values in r1 and r2. The result goes into register r3. The fourth instruction stores the contents of r3 into the memory address symbolized by @a. If we're lucky, both b and c are in a nearby cache and the load instructions only take a few processor cycles to execute. That is the good case, but what if b or c, or both, have to come from very far away? Perhaps both of them are in the main memory and then it easily takes hundreds of cycles for the values to arrive in the registers. Meanwhile the processor is doing nothing and simply waits for the data to arrive. Actually, it does something. It burns cycles while waiting. That is a waste of time and energy. Why not use these cycles to execute instructions from another application or thread in case of a parallel program? That is exactly what latency hiding on the SPARC T-Series processors does. It is a hardware feature totally transparent to the user and application. As soon as there is a delay in the execution, the hardware uses these otherwise idle cycles to execute instructions from another process. As a result, the throughput capacity of the system improves because idle cycles are no longer wasted and therefore more jobs can be run per unit of time. This feature has been in the SPARC T-series from the beginning, so why this paper? The difference with previous publications on this topic is in the amount of detail given. How this all works under the hood is fully explained using two example programs. Starting from the assembly language instructions, it is demonstrated in what way these programs execute. To really see what is happening we go down to the processor pipeline level, where the gaps in the execution are, and show in what way these idle cycles are filled by other copies of the same program running simultaneously. Both the SPARC T4 as well as the older UltraSPARC T2+ processor are covered. You may wonder why the UltraSPARC T2+ is included. The focus of this work is on the SPARC T4 processor, but to explain the basic concept of latency hiding at this very low level, we start with the UltraSPARC T2+ processor because it is architecturally a much simpler design. From the single issue, in-order pipelines of this processor we then shift gears and cover how this all works on the much more advanced dual issue, out-of-order architecture of the T4. The analysis and performance experiments have been conducted on both processors. The results depend on the processor, but in all cases the theoretical estimates are confirmed by the experiments. If you're interested to read a lot more about this and find out how things really work under the hood, you can download a copy of the paper here. A paper like this could not have been produced without the help of several other people. I want to thank the co-author of this paper, Jared Smolens, for his very valuable contributions and our highly inspiring discussions. I'm also indebted to Thomas Nau (Ulm University, Germany), Shane Sigler and Mark Woodyard (both at Oracle) for their feedback on earlier versions of this paper. Karen Perkins (Perkins Technical Writing and Editing) and Rick Ramsey at Oracle were very helpful in providing editorial and publishing assistance.

Read the article

Static factory pattern with EJB3/JBoss

- by purecharger

I'm fairly new to EJBs and full blown application servers like JBoss, having written and worked with special purpose standalone Java applications for most of my career, with limited use of JEE. I'm wondering about the best way to adapt a commonly used design pattern to EJB3 and JBoss: the static factory pattern. In fact this is Item #1 in Joshua Bloch's Effective Java book (2nd edition) I'm currently working with the following factory: public class CredentialsProcessorFactory { private static final Log log = LogFactory.getLog(CredentialsProcessorFactory.class); private static Map<CredentialsType, CredentialsProcessor> PROCESSORS = new HashMap<CredentialsType, CredentialsProcessor>(); static { PROCESSORS.put(CredentialsType.CSV, new CSVCredentialsProcessor()); } private CredentialsProcessorFactory() {} public static CredentialsProcessor getProcessor(CredentialsType type) { CredentialsProcessor p = PROCESSORS.get(type); if(p == null) throw new IllegalArgumentException("No CredentialsProcessor registered for type " + type.toString()); return p; } However, in the implementation classes of CredentialsProcessor, I require injected resources such as a PersistenceContext, so I have made the CredentialsProcessor interface a @Local interface, and each of the impl's marked with @Stateless. Now I can look them up in JNDI and use the injected resources. But now I have a disconnect because I am not using the factory anymore. My first thought was to change the getProcessor(CredentialsType) method to do a JNDI lookup and return the SLSB instance that is required, but then I need to configure and pass the proper qualified JNDI name. Before I go down that path, I wanted to do more research on accepted practices. How is this design pattern treated in EJB3 / JEE?

Search Results

Search found 740 results on 30 pages for 'processors'.

Page 6/30 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

- by JoshReuben

- by Brian

- by Brian

- by tonyrogerson

- by Brian

- by Luis Moreno Campos

- by rickramsey

- by Dave

- by Giri Mandalika

- by Andrew Shepherd

- by altvali

- by Brian

- by Stefan Hinker

- by Brian

- by Ruud

- by purecharger

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >