Search Results

Search found 6 results on 1 pages for 'top500'.

Page 1/1 | 1

How to get the best LINPACK result and conquer the Top500?

- by knweiss

Given a large Linux HPC cluster with hundreds/thousands of nodes. What are your best practices to get the best possible LINPACK benchmark (HPL) result to submit for the Top500 supercomputer list? To give you an idea what kind of answers I would appreciate here are some sub-questions (with links): How to you tune the parameters (N, NB, P, Q, memory-alignment, etc) for the HPL.dat file (without spending too much time trying each possible permutation - esp with large problem sizes N)? Are there any Top500 submission rules to be aware of? What is allowed, what isn't? Which MPI product, which version? Does it make a difference? Any special host order in your MPI machine file? Do you use CPU pinning? How to you configure your interconnect? Which interconnect? Which BLAS package do you use for which CPU model? (Intel MKL, AMD ACML, GotoBLAS2, etc.) How do you prepare for the big run (on all nodes)? Start with small runs on a subset of nodes and then scale up? Is it really necessary to run LINPACK with a big run on all of the nodes (or is extrapolation allowed)? How do you optimize for the latest Intel/AMD CPUs? Hyperthreading? NUMA? Is it worth it to recompile the software stack or do you use precompiled binaries? Which settings? Which compiler optimizations, which compiler? (What about profile-based compilation?) How to get the best result given only a limited amount of time to do the benchmark run? (You can block a huge cluster forever) How do you prepare the individual nodes (stopping system daemons, freeing memory, etc)? How do you deal with hardware faults (ruining a huge run)? Are there any must-read documents or websites about this topic? E.g. I would love to hear about some background stories of some of the current Top500 systems and how they did their LINPACK benchmark. I deliberately don't want to mention concrete hardware details or discuss hardware recommendations because I don't want to limit the answers. However, feel free to mention hints e.g. for specific CPU models.

Read the article
A new number one

- by nospam(at)example.com (Joerg Moellenkamp)

The Top500 supercomputer list has a new number one: The K Computer, built by Fujitsu, currently combines 68544 SPARC64 VIIIfx CPUs, each with eight cores, for a total of 548,352 cores?almost twice as many as any other system in the TOP500. The K Computer is also more powerful than the next five systems on the list combined.Interestingly this system runs under Linux. And it uses tofu as its interconnect

Read the article
About the K computer

- by nospam(at)example.com (Joerg Moellenkamp)

Okay ? after getting yet another mail because of the new #1 on the Top500 list, I want to add some comments from my side: Yes, the system is using SPARC processor. And that is great news for a SPARC fan like me. It is using the SPARC VIIIfx processor from Fujitsu clocked at 2 GHz. No, it isn't the only one. Most people are saying there are two in the Top500 list using SPARC (#77 JAXA and #1 K) but in fact there are three. The Tianhe-1 (#2 on the Top500 list) super computer contains 2048 Galaxy "FT-1000" 1 GHz 8-core processors. Don't know it? The FeiTeng-1000 ? this proc is a 8 core, 8 threads per core, 1 ghz processor made in China. And it's SPARC based. By the way ? this sounds really familiar to me ? perhaps the people just took the opensourced UltraSPARC-T2 design, because some of the parameters sound just to similar. However it looks like that Tianhe-1 is using the SPARCs as input nodes and not as compute notes. No, I don't see it as the next M-series processor. Simple reason: You can't create SMP systems out of them ? it simply hasn't the functionality to do so. Even when there are multiple CPUs on a single board, they are not connected like an SMP/NUMA machine to a shared memory machine ? they are connected with the cluster interconnect (in this case the Tofu interconnect) and work like a large cluster. Yes, it has a lot of oomph in Linpack ? however I assume a lot came from the extensions to the SPARCv9 standard. No, Linpack has no relevance for any commercial workload ? Linpack is such a special load, that even some HPC people are arguing that it isn't really a good benchmark for HPC. It's embarrassingly parallel, it can work with relatively small interconnects compared to the interconnects in SMP systems (however we get in spheres SMP interconnects where a few years ago). Amdahl isn't hitting that hard when running Linpack. Yes, it's a good move to use SPARC. At some time in the last 10 years, there was an interesting twist in perception: SPARC was considered as proprietary architecture and x86 was the open architecture. However it's vice versa ? try to create a x86 clone and you have a lot of intellectual property problems, create a SPARC clone and you have to spend 100 bucks or so to get the specification from the SPARC Foundation and develop your own SPARC processor. Fujitsu is doing this for a long time now. So they had their own processor, their own know-how. So why was SPARC a good choice? Well ? essentially Fujitsu can do what they want with their core as it is their core, for example adding the extensions to the SPARCv9 chipset ? getting Intel to create extensions to x86 to help you with your product is a little bit harder. So Fujitsu could do they needed to do with their processor in order to create such a supercomputer. No, the K is really using no FPGA or GPU as accelerators. The K is really using the CPU at doing this job. Yes, it has a significantly enhanced FPU capable to execute 8 instructions in parallel. No, it doesn't run Solaris. Yes, it uses Linux. No, it doesn't hurt me ... as my colleague Roland Rambau (he knows a lot about HPC) said once to me ... it doesn't matter which OS is staying out of the way of the workload in HPC.

Read the article
La chine possède le deuxième supercalculateur le plus rapide au monde, qui vient juste derrière le s

La Chine possède le deuxième supercalculateur le plus rapide au monde, qui arrive juste derrière le supercalculateur américain Jaguar XT Selon le classement semestriel Top500 mis à jour à l'occasion de l'International Supercomputing Conference (ISC), la chine occupe la deuxième position des supercalculateurs les plus rapides au monde, grâce à son supercalculateur Nebulae qui vient juste derrière l'américain Jaguar XT. [IMG]http://djug.developpez.com/rsc/Top-10-supercalc-june2010.PNG[/IMG] Ce monstre chinois possède une puissance de calcule de 1,271 petaflops sachant que le petaflops correspond à 10^15 d'opérations à la seconde, et qui peut arriver théoriquement jusqu'à 2,98 petaflops ce qui le place loin d...

Read the article
????????

- by Tatsuya Sugi

??????????????????? ?????????????????????????????????????????????????????IT?????????????????????????????????????????????????????????????????????? Java ? .NET ????????????????????????????????...????????????????????????????????????????????????????????????????????????????????COBOL??????????????? Java ???????????????????? ??????????Java????????????????????????????? ??/?????????????????????????? ??????????????????????????????????????? ?????????????·?????????????????HW????????/??????? ????????????????????????? ?????????????????????????????????? ??????????????????????? Java ?????????????????? ???????????????????????????????????????????????????????????????????????????????????? ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ??????????????????????????? ??????????????????????????????????????????????????????????(HPC:?????????·?????????)??????????????????????????????????????????????Java??????????????????? ???????????????? ??????????? ?Java + ??????·????????????????????? ??????·?????????????????????????? ?????????·?????????????????????????CPU????????????????????????????????????????????????????????TOP500???????????????8???x86?CPU????????????????????????? ??????·????????·???????????????????????????????/?????????????????????????????????????????????????? ????????????????????? ???: ??????????????/???????????? ?????????????????????????????????????????????????????/???????????????????????????????????????????? ???????: ?????????·????????????????? ????CPU????????????????????????????????·??????????/??????????????????????????????????????????????????????????????????????????????????????????? ??????????????????Java????????????????? ???JVM???????????????? ???????????????????????JVM?????????·????????????? ??????????????????????????????????????????? ??????????????? Oracle Coherence ???????????????????????????????????(??/????????) ??/???????? ???????????? ?????????????????????????????? ????????

Read the article
Infiniband: a highperformance network fabric - Part I

- by Karoly Vegh

Introduction:At the OpenWorld this year I managed to chat with interesting people again - one of them answering Infiniband deepdive questions with ease by coffee turned out to be one of Oracle's IB engineers, Ted Kim, who actually actively participates in the Infiniband Trade Association and integrates Oracle solutions with this highspeed network. This is why I love attending OOW. He granted me an hour of his time to talk about IB. This post is mostly based on that tech interview.Start of the actual post: Traditionally datatransfer between servers and storage elements happens in networks with up to 10 gigabit/seconds or in SANs with up to 8 gbps fiberchannel connections. Happens. Well, data rather trickles through.But nowadays data amounts grow well over the TeraByte order of magnitude, and multisocket/multicore/multithread Servers hunger data that these transfer technologies just can't deliver fast enough, causing all CPUs of this world do one thing at the same speed - waiting for data. And once again, I/O is the bottleneck in computing. FC and Ethernet can't keep up. We have half-TB SSDs, dozens of TB RAM to store data to be modified in, but can't transfer it. Can't backup fast enough, can't replicate fast enough, can't synchronize fast enough, can't load fast enough. The bad news is, everyone is used to this, like back in the '80s everyone was used to start compile jobs and go for a coffee. Or on vacation. The good news is, there's an alternative. Not so-called "bleeding-edge" 8gbps, but (as of now) 56. Not layers of overhead, but low latency. And it is available now. It has been for a while, actually. Welcome to the world of Infiniband. Short history:Infiniband was born as a result of joint efforts of HPAQ, IBM, Intel, Sun and Microsoft. They planned to implement a next-generation I/O fabric, in the 90s. In the 2000s Infiniband (from now on: IB) was quite popular in the high-performance computing field, powering most of the top500 supercomputers. Then in the middle of the decade, Oracle realized its potential and used it as an interconnect backbone for the first Database Machine, the first Exadata. Since then, IB has been booming, Oracle utilizes and supports it in a large set of its HW products, it is the backbone of the famous Engineered Systems: Exadata, SPARC SuperCluster, Exalogic, OVCA and even the new DB backup/recovery box. You can also use it to make servers talk highspeed IP to eachother, or to a ZFS Storage Appliance. Following Oracle's lead, even IBM has jumped the wagon, and leverages IB in its PureFlex systems, their first InfiniBand Machines.IB Structural Overview: If you want to use IB in your servers, the first thing you will need is PCI cards, in IB terms Host Channel Adapters, or HCAs. Just like NICs for Ethernet, or HBAs for FC. In these you plug an IB cable, going to an IB switch providing connection to other IB HCAs. Of course you're going to need drivers for those in your OS. Yes, these are long-available for Solaris and Linux. Now, what protocols can you talk over IB? There's a range of choices. See, IB isn't accepting package loss like Ethernet does, and hence doesn't need to rely on TCP/IP as a workaround for resends. That is, you still can run IP over IB (IPoIB), and that is used in various cases for control functionality, but the datatransfer can run over more efficient protocols - like native IB. About PCI connectivity: IB cards, as you see are fast. They bring low latency, which is just as important as their bandwidth. Current IB cards run at 56 gbit/s. That is slightly more than double of the capacity of a PCI Gen2 slot (of ~25 gbit/s). And IB cards are equipped usually with two ports - that is, altogether you'd need 112 gbit/s PCI slots, to be able to utilize FDR IB cards in an active-active fashion. PCI Gen3 slots provide you with around ~50gbps. This is why the most IB cards are configured in an active-standby way if both ports are used. Once again the PCI slot is the bottleneck. Anyway, the new Oracle servers are equipped with Gen3 PCI slots, an the new IB HCAs support those too. Oracle utilizes the QDR HCAs, running at 40gbp/s brutto, which translates to a 32gbp/s net traffic due to the 10:8 signal-to-data information ratio. Consolidation techniques: Technology never stops to evolve. Mellanox is working on the 100 gbps (EDR) version already, which will be optical, since signal technology doesn't allow EDR to be copper. Also, I hear you say "100gbps? I will never use/need that much". Are you sure? Have you considered consolidation scenarios, where (for example with Oracle Virtual Network) you could consolidate your platform to a high densitiy virtualized solution providing many virtual 10gbps interfaces through that 100gbps? Technology never stops to evolve. I still remember when a 10mbps network was impressively fast. Back in those days, 16MB of RAM was a lot. Now we usually run servers with around 100.000 times more RAM. If network infrastrucure speends could grow as fast as main memory capacities, we'd have a different landscape now :) You can utilize SRIOV as well for consolidation. That is, if you run LDoms (aka Oracle VM Server for SPARC) you do not have to add physical IB cards to all your guest LDoms, and you do not need to run VIO devices through the hypervisor either (avoiding overhead). You can enable SRIOV on those IB cards, which practically virtualizes the PCI bus, and you can dedicate Physical- and Virtual Functions of the virtualized HCAs as native, physical HW devices to your guests. See Raghuram's excellent post explaining SRIOV. SRIOV for IB is supported since LDoms 3.1. This post is getting lengthier, so I will rename it to Part I, and continue it in a second post.

Read the article

Developer IT

top500 - Developer IT

Search Results

Search found 6 results on 1 pages for 'top500'.

Page 1/1 | 1

How to get the best LINPACK result and conquer the Top500?

- by knweiss

Read the article

A new number one

- by nospam(at)example.com (Joerg Moellenkamp)

Read the article

About the K computer

- by nospam(at)example.com (Joerg Moellenkamp)

Read the article

La chine possède le deuxième supercalculateur le plus rapide au monde, qui vient juste derrière le s

Read the article

????????

- by Tatsuya Sugi

Read the article

Infiniband: a highperformance network fabric - Part I

- by Karoly Vegh

Read the article

1