Search Results

Search found 36 results on 2 pages for 'nehalem'.

Page 1/2 | 1 2  | Next Page >

  • Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX)

    - by jchang
    Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX) Performance Intel launched the Xeon 5600 series (Westmere-EP, 32nm) six-core processors on 16 March 2010 without any TPC benchmark results. In the performance world, no results almost always mean bad or not good results. Yet there is every reason to believe that the Xeon 5600 series with six-cores (X models only) will performance exactly as expected for a 50% increase in the number of cores at the same frequency (as the 5500) with no system level...(read more)

    Read the article

  • Disable Memory Modules In BIOS for Testing Purposes (Optimize Nehalem/Gulftown Memory Performance)

    - by Bob
    I recently acquired an HP Z800 with two Intel Xeon X5650 (Gulftown) 6 core processors. The person that configured the system chose 16GB (8 x 2GB DDR3-1333). I'm assuming this person was unaware these processors have 3 memory channels and to optimize memory performance one should choose memory in multiples of three. Based on this information, I have a question: By entering the BIOS, can I disable the bank on each processor that has the single memory module? If so, will this have any adverse effects or behave differently than physically removing the modules? I ask due to the fact that I prefer to store the extra memory in the system if it truly behaves as if the memory is not even there. Also, I see this as an opportunity to test 12GB vs. 16GB to see if there is a noticeable difference. Note: According to http://www.delltechcenter.com/page/04-08-2009+-+Nehalem+and+Memory+Configurations?t=anon, the current configuration reduces the overall data transfer speed to 1066 and in addition, the memory bandwidth goes down by about 23%.

    Read the article

  • IBM System x3650 M2: Benchmark of Oracle's JDE 9.0 with Oracle VM

    - by didier.wojciechowski
    The IBM Oracle International Competency Center (ICC) in Denver, Colorado in a joint effort with the Oracle JD Edwards performance team was the first to execute a certified JD Edwards EnterpriseOne benchmark running on the new Intel® Xeon® processor 5500 series (Nehalem). This benchmark configuration included the IBM System x3650 M2, partitioned using Oracle Virtual Machine (VM), and Oracle's robust "Day in the Life" (DIL) test kit. In October, 2009 the benchmark scaled to 700 users with early code. In January 2010, with GA level code, the benchmark scaled successfully to 1000 users with sub-second response time.

    Read the article

  • Mapping of memory addresses to physical modules in Windows XP

    - by Josef Grahn
    I plan to run 32-bit Windows XP on a workstation with dual processors, based on Intel's Nehalem microarchitecture, and triple channel RAM. Even though XP is limited to 4 GB of RAM, my understanding is that it will function with more than 4 GB installed, but will only expose 4 GB (or slightly less). My question is: Assuming that 6 GB of RAM is installed in six 1 GB modules, which physical 4 GB will Windows actually map into its address space? In particular: Will it use all six 1 GB modules, taking advantage of all memory channels? (My guess is yes, and that the mapping to individual modules within a group happens in hardware.) Will it map 2 GB of address space to each of the two NUMA nodes (as each processor has it's own memory interface), or will one processor get fast access to 3 GB of RAM, while the other only has 1 GB? Thanks!

    Read the article

  • Mapping of memory addresses to physical modules in Windows XP

    - by Josef Grahn
    I plan to run 32-bit Windows XP on a workstation with dual processors, based on Intel's Nehalem microarchitecture, and triple channel RAM. Even though XP is limited to 4 GB of RAM, my understanding is that it will function with more than 4 GB installed, but will only expose 4 GB (or slightly less). My question is: Assuming that 6 GB of RAM is installed in six 1 GB modules, which physical 4 GB will Windows actually map into its address space? In particular: Will it use all six 1 GB modules, taking advantage of all memory channels? (My guess is yes, and that the mapping to individual modules within a group happens in hardware.) Will it map 2 GB of address space to each of the two NUMA nodes (as each processor has it's own memory interface), or will one processor get fast access to 3 GB of RAM, while the other only has 1 GB? Thanks!

    Read the article

  • Speedstep and Intel x5570?

    - by sajal
    Hi, My new server has 2 x X5570 CPUs. Now here is the output of grep -i hz /proc/cpuinfo model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz cpu MHz : 1600.231 It always remains the same.. no matter how much load is mysql or any other app hogging. Even when mysql eats 2 or 3 CPUs at 100% each, the output of cpuinfo is the same. If fact mysql performance for some heavy inserts is poorer than my old E5430 server. Any clues? I contacted the server provider, they tried turning off SpeedStep and still we see the same results. Any insights would be helpful cause I am paying heavily for this box and would love to milk all juice i can.

    Read the article

  • Is it better to have more small ram chips or fewer large ones?

    - by Alex Andronov
    I am currently building a new server. I have options between say 32GB Memory for 2 CPUs, DDR3, 1066MHz (8x4GB Dual Ranked RDIMMs) and 36GB Memory for 2 CPUs, DDR3, 1066MHz (18x2GB Dual Ranked RDIMMs) Both at the same price. Should I go for the higher ram amount or the fewer chips? This will be for a Dell PowerEdge R710 with two Intel® Xeon® E5530, 2.4Ghz, 8MB Cache, 5.86 GT/s QPI, Turbo, HT Thanks

    Read the article

  • Will disabling hyperthreading improve performance on our SQL Server install

    - by Sam Saffron
    Related to: Current wisdom on SQL Server and Hyperthreading Recently we upgraded our Windows 2008 R2 database server from an X5470 to a X5560. The theory is both CPUs have very similar performance, if anything the X5560 is slightly faster. However, SQL Server 2008 R2 performance has been pretty bad over the last day or so and CPU usage has been pretty high. Page life expectancy is massive, we are getting almost 100% cache hit for the pages, so memory is not a problem. When I ran: SELECT * FROM sys.dm_os_wait_stats order by signal_wait_time_ms desc I got: wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms ------------------------------------------------------------ -------------------- -------------------- -------------------- -------------------- XE_TIMER_EVENT 115166 2799125790 30165 2799125065 REQUEST_FOR_DEADLOCK_SEARCH 559393 2799053973 5180 2799053973 SOS_SCHEDULER_YIELD 152289883 189948844 960 189756877 CXPACKET 234638389 2383701040 141334 118796827 SLEEP_TASK 170743505 1525669557 1406 76485386 LATCH_EX 97301008 810738519 1107 55093884 LOGMGR_QUEUE 16525384 2798527632 20751319 4083713 WRITELOG 16850119 18328365 1193 2367880 PAGELATCH_EX 13254618 8524515 11263 1670113 ASYNC_NETWORK_IO 23954146 6981220 7110 1475699 (10 row(s) affected) I also ran -- Isolate top waits for server instance since last restart or statistics clear WITH Waits AS ( SELECT wait_type, wait_time_ms / 1000. AS [wait_time_s], 100. * wait_time_ms / SUM(wait_time_ms) OVER() AS [pct], ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS [rn] FROM sys.dm_os_wait_stats WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE', 'SLEEP_TASK','SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR','LOGMGR_QUEUE', 'CHECKPOINT_QUEUE','REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH', 'BROKER_TASK_STOP','CLR_MANUAL_EVENT','CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT','XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN')) SELECT W1.wait_type, CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s, CAST(W1.pct AS DECIMAL(12, 2)) AS pct, CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold And got wait_type wait_time_s pct running_pct CXPACKET 554821.66 65.82 65.82 LATCH_EX 184123.16 21.84 87.66 SOS_SCHEDULER_YIELD 37541.17 4.45 92.11 PAGEIOLATCH_SH 19018.53 2.26 94.37 FT_IFTSHC_MUTEX 14306.05 1.70 96.07 That shows huge amounts of time synchronizing queries involving parallelism (high CXPACKET). Additionally, anecdotally many of these problem queries are being executed on multiple cores (we have no MAXDOP hints anywhere in our code) The server has not been under load for more than a day or so. We are experiencing a large amount of variance with query executions, typically many queries appear to be slower that they were on our previous DB server and CPU is really high. Will disabling Hyperthreading help at reducing our CPU usage and increase throughput?

    Read the article

  • Optimizing Solaris 11 SHA-1 on Intel Processors

    - by danx
    SHA-1 is a "hash" or "digest" operation that produces a 160 bit (20 byte) checksum value on arbitrary data, such as a file. It is intended to uniquely identify text and to verify it hasn't been modified. Max Locktyukhin and others at Intel have improved the performance of the SHA-1 digest algorithm using multiple techniques. This code has been incorporated into Solaris 11 and is available in the Solaris Crypto Framework via the libmd(3LIB), the industry-standard libpkcs11(3LIB) library, and Solaris kernel module sha1. The optimized code is used automatically on systems with a x86 CPU supporting SSSE3 (Intel Supplemental SSSE3). Intel microprocessor architectures that support SSSE3 include Nehalem, Westmere, Sandy Bridge microprocessor families. Further optimizations are available for microprocessors that support AVX (such as Sandy Bridge). Although SHA-1 is considered obsolete because of weaknesses found in the SHA-1 algorithm—NIST recommends using at least SHA-256, SHA-1 is still widely used and will be with us for awhile more. Collisions (the same SHA-1 result for two different inputs) can be found with moderate effort. SHA-1 is used heavily though in SSL/TLS, for example. And SHA-1 is stronger than the older MD5 digest algorithm, another digest option defined in SSL/TLS. Optimizations Review SHA-1 operates by reading an arbitrary amount of data. The data is read in 512 bit (64 byte) blocks (the last block is padded in a specific way to ensure it's a full 64 bytes). Each 64 byte block has 80 "rounds" of calculations (consisting of a mixture of "ROTATE-LEFT", "AND", and "XOR") applied to the block. Each round produces a 32-bit intermediate result, called W[i]. Here's what each round operates: The first 16 rounds, rounds 0 to 15, read the 512 bit block 32 bits at-a-time. These 32 bits is used as input to the round. The remaining rounds, rounds 16 to 79, use the results from the previous rounds as input. Specifically for round i it XORs the results of rounds i-3, i-8, i-14, and i-16 and rotates the result left 1 bit. The remaining calculations for the round is a series of AND, XOR, and ROTATE-LEFT operators on the 32-bit input and some constants. The 32-bit result is saved as W[i] for round i. The 32-bit result of the final round, W[79], is the SHA-1 checksum. Optimization: Vectorization The first 16 rounds can be vectorized (computed in parallel) because they don't depend on the output of a previous round. As for the remaining rounds, because of step 2 above, computing round i depends on the results of round i-3, W[i-3], one can vectorize 3 rounds at-a-time. Max Locktyukhin found through simple factoring, explained in detail in his article referenced below, that the dependencies of round i on the results of rounds i-3, i-8, i-14, and i-16 can be replaced instead with dependencies on the results of rounds i-6, i-16, i-28, and i-32. That is, instead of initializing intermediate result W[i] with: W[i] = (W[i-3] XOR W[i-8] XOR W[i-14] XOR W[i-16]) ROTATE-LEFT 1 Initialize W[i] as follows: W[i] = (W[i-6] XOR W[i-16] XOR W[i-28] XOR W[i-32]) ROTATE-LEFT 2 That means that 6 rounds could be vectorized at once, with no additional calculations, instead of just 3! This optimization is independent of Intel or any other microprocessor architecture, although the microprocessor has to support vectorization to use it, and exploits one of the weaknesses of SHA-1. Optimization: SSSE3 Intel SSSE3 makes use of 16 %xmm registers, each 128 bits wide. The 4 32-bit inputs to a round, W[i-6], W[i-16], W[i-28], W[i-32], all fit in one %xmm register. The following code snippet, from Max Locktyukhin's article, converted to ATT assembly syntax, computes 4 rounds in parallel with just a dozen or so SSSE3 instructions: movdqa W_minus_04, W_TMP pxor W_minus_28, W // W equals W[i-32:i-29] before XOR // W = W[i-32:i-29] ^ W[i-28:i-25] palignr $8, W_minus_08, W_TMP // W_TMP = W[i-6:i-3], combined from // W[i-4:i-1] and W[i-8:i-5] vectors pxor W_minus_16, W // W = (W[i-32:i-29] ^ W[i-28:i-25]) ^ W[i-16:i-13] pxor W_TMP, W // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) movdqa W, W_TMP // 4 dwords in W are rotated left by 2 psrld $30, W // rotate left by 2 W = (W >> 30) | (W << 2) pslld $2, W_TMP por W, W_TMP movdqa W_TMP, W // four new W values W[i:i+3] are now calculated paddd (K_XMM), W_TMP // adding 4 current round's values of K movdqa W_TMP, (WK(i)) // storing for downstream GPR instructions to read A window of the 32 previous results, W[i-1] to W[i-32] is saved in memory on the stack. This is best illustrated with a chart. Without vectorization, computing the rounds is like this (each "R" represents 1 round of SHA-1 computation): RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR With vectorization, 4 rounds can be computed in parallel: RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR Optimization: AVX The new "Sandy Bridge" microprocessor architecture, which supports AVX, allows another interesting optimization. SSSE3 instructions have two operands, a input and an output. AVX allows three operands, two inputs and an output. In many cases two SSSE3 instructions can be combined into one AVX instruction. The difference is best illustrated with an example. Consider these two instructions from the snippet above: pxor W_minus_16, W // W = (W[i-32:i-29] ^ W[i-28:i-25]) ^ W[i-16:i-13] pxor W_TMP, W // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) With AVX they can be combined in one instruction: vpxor W_minus_16, W, W_TMP // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) This optimization is also in Solaris, although Sandy Bridge-based systems aren't widely available yet. As an exercise for the reader, AVX also has 256-bit media registers, %ymm0 - %ymm15 (a superset of 128-bit %xmm0 - %xmm15). Can %ymm registers be used to parallelize the code even more? Optimization: Solaris-specific In addition to using the Intel code described above, I performed other minor optimizations to the Solaris SHA-1 code: Increased the digest(1) and mac(1) command's buffer size from 4K to 64K, as previously done for decrypt(1) and encrypt(1). This size is well suited for ZFS file systems, but helps for other file systems as well. Optimized encode functions, which byte swap the input and output data, to copy/byte-swap 4 or 8 bytes at-a-time instead of 1 byte-at-a-time. Enhanced the Solaris mdb(1) and kmdb(1) debuggers to display all 16 %xmm and %ymm registers (mdb "$x" command). Previously they only displayed the first 8 that are available in 32-bit mode. Can't optimize if you can't debug :-). Changed the SHA-1 code to allow processing in "chunks" greater than 2 Gigabytes (64-bits) Performance I measured performance on a Sun Ultra 27 (which has a Nehalem-class Xeon 5500 Intel W3570 microprocessor @3.2GHz). Turbo mode is disabled for consistent performance measurement. Graphs are better than words and numbers, so here they are: The first graph shows the Solaris digest(1) command before and after the optimizations discussed here, contained in libmd(3LIB). I ran the digest command on a half GByte file in swapfs (/tmp) and execution time decreased from 1.35 seconds to 0.98 seconds. The second graph shows the the results of an internal microbenchmark that uses the Solaris libpkcs11(3LIB) library. The operations are on a 128 byte buffer with 10,000 iterations. The results show operations increased from 320,000 to 416,000 operations per second. Finally the third graph shows the results of an internal kernel microbenchmark that uses the Solaris /kernel/crypto/amd64/sha1 module. The operations are on a 64Kbyte buffer with 100 iterations. third graph shows the results of an internal kernel microbenchmark that uses the Solaris /kernel/crypto/amd64/sha1 module. The operations are on a 64Kbyte buffer with 100 iterations. The results show for 1 kernel thread, operations increased from 410 to 600 MBytes/second. For 8 kernel threads, operations increase from 1540 to 1940 MBytes/second. Availability This code is in Solaris 11 FCS. It is available in the 64-bit libmd(3LIB) library for 64-bit programs and is in the Solaris kernel. You must be running hardware that supports Intel's SSSE3 instructions (for example, Intel Nehalem, Westmere, or Sandy Bridge microprocessor architectures). The easiest way to determine if SSSE3 is available is with the isainfo(1) command. For example, nehalem $ isainfo -v $ isainfo -v 64-bit amd64 applications sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu If the output also shows "avx", the Solaris executes the even-more optimized 3-operand AVX instructions for SHA-1 mentioned above: sandybridge $ isainfo -v 64-bit amd64 applications avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this code. Solaris libraries and kernel automatically determine if it's running on SSSE3 or AVX-capable machines and execute the correctly-tuned code for that microprocessor. Summary The Solaris 11 Crypto Framework, via the sha1 kernel module and libmd(3LIB) and libpkcs11(3LIB) libraries, incorporated a useful SHA-1 optimization from Intel for SSSE3-capable microprocessors. As with other Solaris optimizations, they come automatically "under the hood" with the current Solaris release. References "Improving the Performance of the Secure Hash Algorithm (SHA-1)" by Max Locktyukhin (Intel, March 2010). The source for these SHA-1 optimizations used in Solaris "SHA-1", Wikipedia Good overview of SHA-1 FIPS 180-1 SHA-1 standard (FIPS, 1995) NIST Comments on Cryptanalytic Attacks on SHA-1 (2005, revised 2006)

    Read the article

  • MooseX::Types declaration issue, tight test case :)

    - by TJ Thompson
    So after an embarrassing amount of time debugging, I've finally stripped this issue ([http://stackoverflow.com/questions/4621589/perl-moose-typedecorator-error-how-do-i-debug][1]) down to a simple test case. I would humbly request some help understanding why it's failing :) Here is the error message I'm getting: plxc16479 $h2/tmp/tmp18.pl This method [new] requires a single argument. at /nfs/pdx/disks/nehalem.pde.077/perl/5.12.2/lib64/site_perl/MooseX/Types/TypeDecorator.pm line 91 MooseX::Types::TypeDecorator::new('MooseX::Types::TypeDecorator=HASH(0x655b90)') called at /nfs/pdx/disks/nehalem.pde.077/projects/lib/Program-Plist-Pl/lib/Program/Plist/Pl.pm line 10 Program::Plist::Pl::BUILD('Program::Plist::Pl=HASH(0x63d478)', 'HASH(0x63d220)') called at generated method (unknown origin) line 29 Program::Plist::Pl::new('Program::Plist::Pl') called at /nfs/pdx/disks/nehalem.pde.077/tmp/tmp18.pl line 10 Wrapper test script: use strict; use warnings; BEGIN {push(@INC, split(':', $ENV{PERL_TEST_LIBS}))}; use Program::Plist::Pl; my $obj = Program::Plist::Pl->new(); Program::Plist::Pl file: package Program::Plist::Pl; use Moose; use namespace::autoclean; use Program::Types qw(Pattern); # <-- Removing this fixes error use Program::Plist::Pl::Pattern; sub BUILD { my $pattern_obj = Program::Plist::Pl::Pattern->new(); } __PACKAGE__->meta->make_immutable; 1; Program::Types file: package Program::Types; use MooseX::Types -declare => [qw(Pattern)]; class_type Pattern, {class => 'Program::Plist::Pl::Pattern'}; 1; And the Program::Plist::Pl::Pattern file: package Program::Plist::Pl::Pattern; use Moose; use namespace::autoclean; __PACKAGE__->meta->make_immutable; 1; Notes: While I don't need the Pattern type from Program::Types in the above code, I do in other code that is stripped out. The PERL_TEST_LIBS env var I'm pulling INC paths from only contains paths to the project modules. There are no other modules loaded from these paths. It appears the MooseX::Types definition for Pattern is causing problems, but I'm not sure why. Documentation shows the syntax I am using, but it's possible I'm misusing class_type as there isn't much said about it. Intent is to be able to use Pattern for type checking via MooseX::Params::Validate to verify the argument is a 'Program::Plist::Pl::Program' object. I've found that removing the intervening class Program::Plist::Pl from the equation by directly calling Pattern-new from the tmp18.pl wrapper results in no error, even when the Program::Types Pattern type is imported.

    Read the article

  • Is it too early to start designing for Task Parallel Library?

    - by Joe Erickson
    I have been following the development of the .NET Task Parallel Library (TPL) with great interest since Microsoft first announced it. There is no doubt in my mind that we will eventually take advantage of TPL. What I am questioning is whether it makes sense to start taking advantage of TPL when Visual Studio 2010 and .NET 4.0 are released, or whether it makes sense to wait a while longer. Why Start Now? The .NET 4.0 Task Parallel Library appears to be well designed and some relatively simple tests demonstrate that it works well on today's multi-core CPUs. I have been very interested in the potential advantages of using multiple lightweight threads to speed up our software since buying my first quad processor Dell Poweredge 6400 about seven years ago. Experiments at that time indicated that it was not worth the effort, which I attributed largely to the overhead of moving data between each CPU's cache (there was no shared cache back then) and RAM. Competitive advantage - some of our customers can never get enough performance and there is no doubt that we can build a faster product using TPL today. It sounds fun. Yes, I realize that some developers would rather poke themselves in the eye with a sharp stick, but we really enjoy maximizing performance. Why Wait? Are today's Intel Nehalem CPUs representative of where we are going as multi-core support matures? You can purchase a Nehalem CPU with 4 cores which share a single level 3 cache today, and most likely a 6 core CPU sharing a single level 3 cache by the time Visual Studio 2010 / .NET 4.0 are released. Obviously, the number of cores will go up over time, but what about the architecture? As the number of cores goes up, will they still share a cache? One issue with Nehalem is the fact that, even though there is a very fast interconnect between the cores, they have non-uniform memory access (NUMA) which can lead to lower performance and less predictable results. Will future multi-core architectures be able to do away with NUMA? Similarly, will the .NET Task Parallel Library change as it matures, requiring modifications to code to fully take advantage of it? Limitations Our core engine is 100% C# and has to run without full trust, so we are limited to using .NET APIs.

    Read the article

  • Intel fait durer le suspens sur Sandy Bridge en délivrant les informations au compte-gouttes sur la

    Intel fait durer le suspens sur Sandy Bridge En délivrant les informations au compte-gouttes sur la future micro-architecture de ses processeurs Sandy Bridge est le successeur de Nehalem, la micro-architecture utilisée par Intel pour ses processeurs. Bien qu'on en sache encore assez peu sur cette nouvelle technologie, elle fait d'ores et déjà couler beaucoup d'encre. La stratégie de communication d'Intel, qui délivre les informations au compte-gouttes, n'y est certainement pas pour rien. Lors des derniers Intel Developer Forum (IDF) qui vient de s'achever, David Perlmutter - Executive Vice President Intel Architecture Group - a néanmoins laissé filtrer quelques indices :

    Read the article

  • IBM System x Server Buyer's Guide

    IBM has taken to the road with the message that Intel's Nehalem EX processors coupled with Big Blue's system engineering talents has resulted in a platform well-suited for virtualization, consolidation and mission-critical applications. Does the server hardware live up to the praise?

    Read the article

  • IBM System x Server Buyer's Guide

    IBM has taken to the road with the message that Intel's Nehalem EX processors coupled with Big Blue's system engineering talents has resulted in a platform well-suited for virtualization, consolidation and mission-critical applications. Does the server hardware live up to the praise?

    Read the article

  • Why does Ubuntu 9.10 hang during boot at "Booting processor 1 APIC 0x1 ip 0x6000"?

    - by BraeburnDev
    I recently installed a new copy of Ubuntu 9.10 (Kernel 2.6.31-14) on to my Hp Pavilion dv6t, so I can setup a Linux development environment. The install went flawlessly and I proceeded with Ubuntu's udate manager's long list of updates (292 in all). I also setup a swap file and activated a Nvidia 185 driver for the Nvidia 260m GPU on the machine. After all this was done I restarted the computer and booted into Ubuntu this time with a newer 2.6.31-19 Kernel which was installed from the update manager. During booth the computer hung at this point: Feb 24 14:23:12 braeburn-laptop kernel: [ 0.013136] Performance Counters: Nehalem/Corei7 events, Intel PMU driver. Feb 24 14:23:12 braeburn-laptop kernel: [ 0.013141] ... version: 3 Feb 24 14:23:12 braeburn-laptop kernel: [ 0.013142] ... bit width: 48 Feb 24 14:23:12 braeburn-laptop kernel: [ 0.013144] ... generic counters: 4 Feb 24 14:23:12 braeburn-laptop kernel: [ 0.013146] ... value mask: 0000ffffffffffff Feb 24 14:23:12 braeburn-laptop kernel: [ 0.013147] ... max period: 000000007fffffff Feb 24 14:23:12 braeburn-laptop kernel: [ 0.013149] ... fixed-purpose counters: 3 Feb 24 14:23:12 braeburn-laptop kernel: [ 0.013151] ... counter mask: 000000070000000f Feb 24 14:23:12 braeburn-laptop kernel: [ 0.015539] ACPI: Core revision 20090521 Feb 24 14:23:12 braeburn-laptop kernel: [ 0.052264] Setting APIC routing to flat Feb 24 14:23:12 braeburn-laptop kernel: [ 0.052639] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 Feb 24 14:23:12 braeburn-laptop kernel: [ 0.152580] CPU0: Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz stepping 05 Feb 24 14:23:12 braeburn-laptop kernel: [ 0.270845] Booting processor 1 APIC 0x1 ip 0x6000 I can post a full kern.log of this boot process if requested. Hopefully this is enough information to go on. I should add that I'm still new to configuring and running a Linux OS although I know enough basic command line usage to do software development. This is my attempt to become more familiar with Linux and manage my own system. I'd like to get some insight on the nature of this system hang, what the problem is and how to resolve it. At this point I can scrap the install if I broke something, but my intuition says this is an issue with the kernel recognizing the correct hardware configuration for my system, or perhaps this is an issue with the APIC drivers managing Nehalem's new power management capabilities? Thanks for looking at this issue and providing feed back.

    Read the article

  • Intel Xeon 5600 (Westmere-EP) and AMD Magny-Cours Performance Update

    - by jchang
    HP has just released TPC-C and TPC-E results for the ProLiant DL380G7 with 2 Xeon 5680 3.33GHz 6-core processor, allowing a direct comparison with their DL385G& with 2 Opteron 6176 2.3GHz 12-core processors. Last month I complained about the lack of performance results for the Intel Xeon 5600 6-core 32nm processor line for 2-way systems. This might have been deliberate to not complicate the message for the Xeon 7500 8-core 45nm (for 4-way+ systems) launch two weeks later. http://sqlblog.com/blogs/joe_chang/archive/2010/04/07/intel-xeon-5600-westmere-ep-and-7500-nehalem-ex.aspx...(read more)

    Read the article

  • Oracle Solaris Studio Express 6/10 and its Customer Feedback Program are now available

    - by pieter.humphrey
    Oracle Solaris Studio Express 6/10 and the Customer Feedback Program for it are now available. Oracle Solaris Studio Express 6/10 is available on Solaris 10 (SPARC, x86), OEL 5 (x86), RHEL 5 (x86), SuSE 11 (x86) today and will be available for OpenSolaris in the near future. New feature highlights since the last release include: C/C++/Fortran compiler optimizations for the latest UltraSPARC and SPARC64-based architectures such as UltraSPARC T2 and SPARC64 VII C/C++/Fortran compiler optimizations for the latest x86 architectures including the Intel Xeon 7500 processor series (Nehalem-EX) and the Intel Xeon 5600 processor series (Westmere-EP) Enhanced debugging and code coverage tooling Improved application profiling with the Performance Analyzer Updated IDE based on NetBeans 6.8 To find more information and download go to http://developers.sun.com/sunstudio/downloads/express/ To participate in the customer feedback program for Oracle Solaris Studio Express 6/10 go to http://developers.sun.com/sunstudio/customerfeedback/index.jsp Please get the word out, try out this new release and send us your feedback! Technorati Tags: developer,development,solaris,sparc,Oracle Solaris Studio,Solaris Studio,Sun Studio,oracle,otn del.icio.us Tags: developer,development,solaris,sparc,Oracle Solaris Studio,Solaris Studio,Sun Studio,oracle,otn

    Read the article

  • Is it possible to use a dual processor computer as your desktop?

    - by Ivo
    I've seen some people suggesting to get a motherboard that supports two processors and stick two Xeon Nehalem processors in it. Could you use this system as a desktop PC or is this useless or even impossible? It's more hypothetical question if Windows 7 would support such a set-up. I know you could just take an i7, but wouldn't two of those processors be a whole lot more awesome? Like the previous generation Skulltrails? The idea would be to have a motherboard like this ASUS Z8NA-D6C Dual LGA 1366 Intel 5500 ATX and two Xeons (since I don't think i7's could be used) Intel Xeon E5405 Harpertown to run something like Windows 7 Ultimate.

    Read the article

1 2  | Next Page >