Search Results

Search found 57 results on 3 pages for 'parallelcomputing'.

Page 1/3 | 1 2 3 | Next Page >

C++ AMP Video Overview

- by Daniel Moth

I hope to be recording some C++ AMP screencasts for channel9 soon (you'll find them through my regular screencasts link on the left), and in all of them I will assume you have watched this short interview overview of C++ AMP. Note: I think there were some technical problems with streaming so best to download the "High Quality WMV" or switch to progressive format. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Screencasts introducing C++ AMP

- by Daniel Moth

It has been almost 2.5 years since I last recorded a screencast, and I had forgotten how time consuming they are to plan/record/edit/produce/publish, but at the same time so much fun to see the end result! So below are links to 4 screencasts to teach you C++ AMP basics from scratch (even if you class yourself as a .NET developer you'll be able to follow). Setup code - part 1 array_view, extent, index - part 2 parallel_for_each - part 3 accelerator - part 4 If you have comments/questions about what is shown in each video, please leave them at each video recoding. If you have generic questions about C++ AMP, please ask in the C++ AMP MSDN forum. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
GPGPU

WhatGPU obviously stands for Graphics Processing Unit (the silicon powering the display you are using to read this blog post). The extra GP in front of that stands for General Purpose computing.So, altogether GPGPU refers to computing we can perform on GPU for purposes beyond just drawing on the screen. In effect, we can use a GPGPU a bit like we already use a CPU: to perform some calculation (that doesn’t have to have any visual element to it). The attraction is that a GPGPU can be orders of magnitude faster than a CPU.WhyWhen I was at the SuperComputing conference in Portland last November, GPGPUs were all the rage. A quick online search reveals many articles introducing the GPGPU topic. I'll just share 3 here: pcper (ignoring all pages except the first, it is a good consumer perspective), gizmodo (nice take using mostly layman terms) and vizworld (answering the question on "what's the big deal").The GPGPU programming paradigm (from a high level) is simple: in your CPU program you define functions (aka kernels) that take some input, can perform the costly operation and return the output. The kernels are the things that execute on the GPGPU leveraging its power (and hence execute faster than what they could on the CPU) while the host CPU program waits for the results or asynchronously performs other tasks.However, GPGPUs have different characteristics to CPUs which means they are suitable only for certain classes of problem (i.e. data parallel algorithms) and not for others (e.g. algorithms with branching or recursion or other complex flow control). You also pay a high cost for transferring the input data from the CPU to the GPU (and vice versa the results back to the CPU), so the computation itself has to be long enough to justify the overhead transfer costs. If your problem space fits the criteria then you probably want to check out this technology.HowSo where can you get a graphics card to start playing with all this? At the time of writing, the two main vendors ATI (owned by AMD) and NVIDIA are the obvious players in this industry. You can read about GPGPU on this AMD page and also on this NVIDIA page. NVIDIA's website also has a free chapter on the topic from the "GPU Gems" book: A Toolkit for Computation on GPUs.If you followed the links above, then you've already come across some of the choices of programming models that are available today. Essentially, AMD is offering their ATI Stream technology accessible via a language they call Brook+; NVIDIA offers their CUDA platform which is accessible from CUDA C. Choosing either of those locks you into the GPU vendor and hence your code cannot run on systems with cards from the other vendor (e.g. imagine if your CPU code would run on Intel chips but not AMD chips). Having said that, both vendors plan to support a new emerging standard called OpenCL, which theoretically means your kernels can execute on any GPU that supports it. To learn more about all of these there is a website: gpgpu.org. The caveat about that site is that (currently) it completely ignores the Microsoft approach, which I touch on next.On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the DirectX API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels that run on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as a "kernel". For a comprehensive collection of links about this (including tutorials, videos and samples) please see my blog post: DirectCompute.Note that there are many efforts to build even higher level languages on top of DirectX that aim to expose GPGPU programming to a wider audience by making it as easy as today's mainstream programming models. I'll mention here just two of those efforts: Accelerator from MSR and Brahma by Ananth. Comments about this post welcome at the original blog.

Read the article
GPU Debugging with VS 11

- by Daniel Moth

With VS 11 Developer Preview we have invested tremendously in parallel debugging for both CPU (managed and native) and GPU debugging. I'll be doing a whole bunch of blog posts on those topics, and in this post I just wanted to get people started with GPU debugging, i.e. with debugging C++ AMP code. First I invite you to watch 6 minutes of a glimpse of the C++ AMP debugging experience though this video (ffw to minute 51:54, up until minute 59:16). Don't read the rest of this post, just go watch that video, ideally download the High Quality WMV. Summary GPU debugging essentially means debugging the lambda that you pass to the parallel_for_each call (plus any functions you call from the lambda, of course). CPU debugging means debugging all the code above and below the parallel_for_each call, i.e. all the code except the restrict(direct3d) lambda and the functions that it calls. With VS 11 you have to choose what debugger you want to use for a particular debugging session, CPU or GPU. So you can place breakpoints all over your code, then choose what debugger you want (CPU or GPU), and you'll only be able to hit breakpoints for the code type that the debugger engine understands – the remaining breakpoints will appear as unbound. If you want to hit the unbound breakpoints, you'd have to stop debugging, and start again with the other debugger. Sorry. We suck. We know. But once you are past that limitation, I think you'll find the experience truly rewarding – seriously! Switching debugger engines With the Developer Preview bits, one way to switch the debugger engine is through the project properties – see the screenshots that follow. This one is showing the CPU option selected, which is basically the default that you are all familiar with: This screenshot is showing the GPU option selected, by changing the debugger launcher (notice that this applies for both the local and remote case): You actually do not have to open the project properties just for switching the debugger engine, you can switch the selection from the toolbar in VS 11 Developer Preview too – see following screenshot (the effect is the same as if you opened the project properties and switched there) Breakpoint behavior Here are two screenshots, one showing a debugging session for CPU and the other a debugging session for GPU (notice the unbound breakpoints in each case) …and here is the GPU case (where we cannot bind the CPU breakpoints but can the GPU breakpoint, which is actually hit) Give C++ AMP debugging a try So to debug your C++ AMP code, pull down the drop down under the 'play' button to select the 'GPU C++ Direct3D Compute Debugger' menu option, then hit F5 (or the 'play' button itself). Then you can explore debugging by exploring the menus under the Debug and under the Debug->Windows menus. One way to do that exploration is through the C++ AMP debugging walkthrough on MSDN. Another way to explore the C++ AMP debugging experience, you can use the moth.cpp code file, which is what I used in my BUILD session debugger demo. Note that for my demo I was using the latest internal VS11 bits, so your experience with the Developer Preview bits won't be identical to what you saw me demonstrate, but it shouldn't be far off. Stay tuned for a lot more content on the parallel debugger in VS 11, both CPU and GPU, both managed and native. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Microsoft Windows HPC Server R2 Beta2

- by Daniel Moth

Internally and unofficially we refer to this as "HPC Server v3" and its Beta2 became available last week. Read the full story on this blog post from Ryan and this one from Don. There has been a lot of excitement on the web for this release with coverage from last Wednesday here, here, here, here, here and here. Don't forget that Visual Studio 2010 makes it easy to develop for HPC Server including the MPI Cluster Debugger integration that I explained here and here. Comments about this post welcome at the original blog.

Read the article
C++ AMP open specification

- by Daniel Moth

Those of you interested in C++ AMP should know that I blog about that topic on our team blog. Just now I posted (and encourage you to go read) our much awaited announcement about the publication of the C++ AMP open specification. For those of you into compiling instead of reading, 3 days ago I posted a list of over a dozen C++ AMP samples. To follow what I and others on my team write about C++ AMP, stay tuned on our RSS feed. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
My MSDN magazine articles are live

- by Daniel Moth

Five years ago I wrote my first MSDN magazine article, and 21 months later I wrote my second MSDN Magazine article (during the VS 2010 Beta). By my calculation, that makes it two and a half years without having written anything for my favorite developer magazine! So, I came back with a vengeance, and in this month's April issue of the MSDN magazine you can find two articles from yours truly - enjoy: A Code-Based Introduction to C++ AMP Introduction to Tiling in C++ AMP For more on C++ AMP, please remember that I blog about it on our team blog, and we take questions in our MSDN forum. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Get started with C++ AMP

- by Daniel Moth

With the imminent release of Visual Studio 2012, even if you do not classify yourself as a C++ developer, C++ AMP is something you should learn so you can understand how to speed up your loops by offloading to the GPU the computation performed in the loop (assuming you have large number of iterations/data). We have many C# customers who are using C++ AMP through pinvoke, and of course many more directly from C++. So regardless of your programming language, I hope you'll find helpful these short videos that help you get started with C++ AMP C++ AMP core API introduction... from scratch Tiling Introduction - C++ AMP Matrix Multiplication with C++ AMP GPU debugging in Visual Studio 2012 In particular the work we have done for parallel and GPU debugging in Visual Studio 2012 is market leading, so check it out! Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Windows HPC Server links

I've already described how to setup a Windows HPC Server for development. Before you dive into developing for the cluster, if you are new to this it is probably a good idea to learn the basics by reading some overview material. Below is a list of links.Direct Links to Windows HPC content1. Windows HPC Server 2008 Overview Datasheet (4 page pdf).2. Windows HPC Server 2008 Technical Overview (32 page doc).3. Windows HPC Server 2008 Getting Started Guide (26 page doc) which actually is available online as part of the TechNet technical library section on Windows HPC Server 2008, which includes much more useful data.4. Windows HPC Server 2008 Job Scheduler (38 page doc).5. Windows HPC Server 2008 Job Templates (56 page doc).6. Developing for the Windows HPC Server 2008 Platform (16 page doc or pdf version).Windows HPC sites7. Windows HPC Forums.8. HPC Developer Resources.9. Windows HPC Server 2008 Resource Kit - Developer.10. Windows HPC Server 2008 - TechNet.11. The Windows HPC Team Blog.HPC Course12. High-Performance Computing Fundamentals Course (pluralisight)13. Classic HPC Development using Visual C++ (course slides and materials in a ZIP). Author's blog post.14. From sequential to parallel code (course slides and materials in a ZIP). Author's blog post. Next time I will post resources specific to the most popular programming models for the cluster today: MPI and Cluster SOA - until then, happy reading! Comments about this post welcome at the original blog.

Read the article
DirectCompute

In my previous blog post I introduced the concept of GPGPU ending with:On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the Direct X API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as "kernel".In this post I want to share some links to get you started with DirectCompute and HLSL.1. Watch the recording of the PDC 09 session: DirectX11 DirectCompute.2. If session recordings is your thing there are two more on DirectCompute from nvidia's GTC09 conference 1015 (pdf, mp4) and 1411 (mp4 plus the presenter's written version of the session).3. Over at gamedev there is an old Compute Shader tutorial. At the same site, there is a 3-part blog post on Compute Shader: Introduction, Resources and Addressing.4. From PDC, you can also download the DirectCompute Hands On Lab.5. When you are ready to get your hands even dirtier, download the latest Windows DirectX SDK (at the time of writing the latest is dated Feb 2010).6. Within the SDK you'll find a Compute Shader Overview and samples such as: Basic, Sort, OIT, NBodyGravity, HDR Tone Mapping.7. Talking of DX11/DirectCompute samples, there are also a couple of good ones on this URL.8. The documentation of the various APIs is available online. Here are just some good (but far from complete) taster entry points into that: numthreads, SV_DispatchThreadID, SV_GroupThreadID, SV_GroupID, SV_GroupIndex, D3D11CreateDevice, D3DX11CompileFromFile, CreateComputeShader, Dispatch, D3D11_BIND_FLAG, GSSetShader. Comments about this post welcome at the original blog.

Read the article
Attend my Fusion sessions

- by Daniel Moth

The inaugural Fusion conference was 1 year ago in June 2011 and I was there doing a demo in the keynote, and also presenting a breakout session. If you look at the abstract and title for that session you won't see the term "C++ AMP" in there because the technology wasn't announced and we didn't want to spill the beans ahead of the keynote, where the technology was announced. It was only an announcement, we did not give any bits out, and in fact the first bits came three months later in September 2011 with the Beta following in February 2012. So it really feels great 1 year later, to be back at Fusion presenting two sessions on C++ AMP, demonstrating our progress from that announcement, to the Visual Studio 2012 Release Candidate that came out last week. If you are attending Fusion (in person or virtually later), be sure to watch my two-part session. Part 1 is PT-3601 on Tuesday 4pm and part 2 is PT-3602 on Wednesday 4pm. Here is the shared abstract for both parts: Harnessing GPU Compute with C++ AMP C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 2012. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release. See you there! Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Visual Studio 2010 released!

- by Daniel Moth

Visual Studio 2010 releases to the word today. Get the full story from Soma's blog post (inc. links for buy, try etc). Our team is very proud of what we have contributed to this release and you can learn more about it through our content on the Parallel Computing MSDN home. Comments about this post welcome at the original blog.

Read the article
Parallel Debugging

Using Visual Studio 2010 parallel debugging is easy. Two new debugging windows provide a total view of the internals of your PPL and TPL applications with hints on where to start investigations. These are not mere extensions to VS, but tightly integrated with the rest of the debugger experience, so you don't need to learn many new techniques. Use them in your program to eclipse bugs from existence!One of the most FAQ I receive is links to VS2010 parallel debugging content and rather than keep sending many, I decided to gather them all under one permalink, hence this multi link blog post.- MSDN Magazine article on Parallel Debugging.- Screencast of sample code from the article.- MSDN Walkthrough: Debugging a Parallel Application (VB, C++, C#).- Screencast of walkthrough for Parallel Stacks.- Screencast of walkthrough for Parallel Tasks.- MSDN "How To" on Parallel Tasks.- MSDN "How To" on Parallel Stacks.- Detailed blog post on Parallel Tasks.- Detailed blog post on Parallel Stacks.- Detailed blog post on Parallel Stacks - Tasks View.- Detailed blog post on Parallel Stacks - Method View.- Download slides on Parallel Tasks and Parallel Stacks (pptx).If you have questions on these, please post to any of the parallel computing forums or the debugging forum (your question will be routed to me if nobody else can answer it). Comments about this post welcome at the original blog.

Read the article
BUILD apps that use C++ AMP

- by Daniel Moth

If you are a developer on the Microsoft platform, you are hopefully attending (live or virtually) the sessions of the BUILD conference, aka //build/ in Anaheim, CA. The conference sold out not long after it opened registration, and it achieved that without sharing *any* session details nor a meaningful agenda up until after the keynote today – impressive! I am speaking at BUILD and hope you'll catch my talk at 9am on Friday (the last day of the conference) at Marriott Elite 2 Ballroom. Session details follow. 802 - Taming GPU compute with C++ AMP Developers today inject parallelism into their compute-intensive applications in order to take advantage of multi-core CPU hardware. Beyond CPUs, however, compute accelerators such as general-purpose GPUs can provide orders of magnitude speed-ups for data parallel algorithms. How can you as a C++ developer fully utilize this heterogeneous hardware from your Visual Studio environment? How can you benefit from this tremendous performance boost in your Visual C++ solutions without sacrificing developer productivity? The answers will be presented in this session about C++ Accelerated Massive Parallelism. I'll be covering a lot of the material I've been recently blogging about on my blog that you are reading, which I have also indexed over on our team blog under the title: "C++ AMP in a nutshell". Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Screencasts introducing C++ AMP

- by Daniel Moth

It has been almost 2.5 years since I last recorded a screencast, and I had forgotten how time consuming they are to plan/record/edit/produce/publish, but at the same time so much fun to see the end result! So below are links to 4 screencasts to teach you C++ AMP basics from scratch (even if you class yourself as a .NET developer you'll be able to follow). Setup code - part 1 array_view, extent, index - part 2 parallel_for_each - part 3 accelerator - part 4 If you have comments/questions about what is shown in each video, please leave them at each video recoding. If you have generic questions about C++ AMP, please ask in the C++ AMP MSDN forum. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Slides and code for MPI Cluster Debugger

I've blogged before about the MPI Cluster Debugger in VS2010 that facilitates launching the application on the cluster and attaching the debugger (btw, a shorter version of the screencast I link to there, is here).There have been requests for the code I use in the screencast, so please find a ZIP with that code.There have also been requests for a PowerPoint deck to use when showing this feature to others. Feel free to download some slides I threw together the other day. Comments about this post welcome at the original blog.

Read the article
Microsoft Technical Computing

- by Daniel Moth

In the past I have described the team I belong to here at Microsoft (Parallel Computing Platform) in terms of contributing to Visual Studio and related products, e.g. .NET Framework. To be more precise, our team is part of the Technical Computing group, which is still part of the Developer Division. This was officially announced externally earlier this month in an exec email (from Bob Muglia, the president of STB, to which DevDiv belongs). Here is an extract: "… As we build the Technical Computing initiative, we will invest in three core areas: 1. Technical computing to the cloud: Microsoft will play a leading role in bringing technical computing power to scientists, engineers and analysts through the cloud. Existing high- performance computing users will benefit from the ability to augment their on-premises systems with cloud resources that enable ‘just-in-time’ processing. This platform will help ensure processing resources are available whenever they are needed—reliably, consistently and quickly. 2. Simplify parallel development: Today, computers are shipping with more processing power than ever, including multiple cores, but most modern software only uses a small amount of the available processing power. Parallel programs are extremely difficult to write, test and trouble shoot. However, a consistent model for parallel programming can help more developers unlock the tremendous power in today’s modern computers and enable a new generation of technical computing. We are delivering new tools to automate and simplify writing software through parallel processing from the desktop… to the cluster… to the cloud. 3. Develop powerful new technical computing tools and applications: We know scientists, engineers and analysts are pushing common tools (i.e., spreadsheets and databases) to the limits with complex, data-intensive models. They need easy access to more computing power and simplified tools to increase the speed of their work. We are building a platform to do this. Our development efforts will yield new, easy-to-use tools and applications that automate data acquisition, modeling, simulation, visualization, workflow and collaboration. This will allow them to spend more time on their work and less time wrestling with complicated technology. …" Our Parallel Computing Platform team is directly responsible for item #2, and we work very closely with the teams delivering items #1 and #3. At the same time as the exec email, our marketing team unveiled a website with interviews that I invite you to check out: Modeling the World. Comments about this post welcome at the original blog.

Read the article
concurrency::accelerator_view

- by Daniel Moth

Overview We saw previously that accelerator represents a target for our C++ AMP computation or memory allocation and that there is a notion of a default accelerator. We ended that post by introducing how one can obtain accelerator_view objects from an accelerator object through the accelerator class's default_view property and the create_view method. The accelerator_view objects can be thought of as handles to an accelerator. You can also construct an accelerator_view given another accelerator_view (through the copy constructor or the assignment operator overload). Speaking of operator overloading, you can also compare (for equality and inequality) two accelerator_view objects between them to determine if they refer to the same underlying accelerator. We'll see later that when we use concurrency::array objects, the allocation of data takes place on an accelerator at array construction time, so there is a constructor overload that accepts an accelerator_view object. We'll also see later that a new concurrency::parallel_for_each function overload can take an accelerator_view object, so it knows on what target to execute the computation (represented by a lambda that the parallel_for_each also accepts). Beyond normal usage, accelerator_view is a quality of service concept that offers isolation to multiple "consumers" of an accelerator. If in your code you are accessing the accelerator from multiple threads (or, in general, from different parts of your app), then you'll want to create separate accelerator_view objects for each thread. flush, wait, and queuing_mode When you create an accelerator_view via the create_view method of the accelerator, you pass in an option of immediate or deferred, which are the two members of the queuing_mode enum. At any point you can access this value from the queuing_mode property of the accelerator_view. When the queuing_mode value is immediate (which is the default), any commands sent to the device such as kernel invocations and data transfers (e.g. parallel_for_each and copy, as we'll see in future posts), will get submitted as soon as the runtime sees fit (that is the definition of immediate). When the value of queuing_mode is deferred, the commands will be batched up. To send all buffered commands to the device for execution, there is a non-blocking flush method that you can call. If you wish to block until all the commands have been sent, there is a wait method you can call. Deferring is a more advanced scenario aimed at performance gains when you are submitting many device commands and you want to avoid the tiny overhead of flushing/submitting each command separately. Querying information Just like accelerator, accelerator_view exposes the is_debug and version properties. In fact, you can always access the accelerator object from the accelerator property on the accelerator_view class to access the accelerator interface we looked at previously. Interop with D3D (aka DX) In a later post I'll show an example of an app that uses C++ AMP to compute data that is used in pixel shaders. In those scenarios, you can benefit by integrating C++ AMP into your graphics pipeline and one of the building blocks for that is being able to use the same device context from both the compute kernel and the other shaders. You can do that by going from accelerator_view to device context (and vice versa), through part of our interop API in amp.h: *get_device, create_accelerator_view. More on those in a later post. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Dryad and DryadLINQ from MSR

- by Daniel Moth

Microsoft Research (MSR) researches technologies, incubates projects which many times result in technology that looks like a ready-to-use product (but it is important to understand that these are not the same as products built by the various… actual product teams here at Microsoft). A very popular MSR project has been DryadLINQ, which itself builds on Dryad. To learn more follow the project pages I just linked to and I also recommend this 1-hour channel 9 video. If you only have 3 minutes, watch this great elevator pitch instead. You can also stay tuned on the official blog, which includes a post that refers to internal adoption e.g by Bing, a quick DryadLINQ code example, and some history on how DryadLINQ generalizes the MapReduce pattern and makes it accessible to regular programmers (see this post and that post). Essentially, the DryadLINQ framework (building on the Dryad runtime) allows developers to re-use their LINQ skills for creating/generating programs that process large multi-gigabyte/terabyte datasets across 100s-1000s of machines. One way to think about it is that just as Parallel LINQ allows LINQ developers to seamlessly use multiple cores from a single process on a single machine, DryadLINQ allows LINQ developers to seamlessly use multiple machines for their data parallel algorithms. In the former scenario the motivation was speed of execution, in the latter it is speed of execution AND processing large datasets that simply don't fit on a single machine. Whenever I hear about execution of parallel code on multiple machines on the Microsoft platform, I immediately think of Windows HPC Server. Indeed Dryad and DryadLINQ were made available for Windows HPC Server and I encourage you to watch the PDC session on this topic: Data-Intensive Computing on Windows HPC Server with the DryadLINQ Framework. Watch this space… Comments about this post welcome at the original blog.

Read the article
MPI Project Template for VS2010

If you are developing MS MPI applications with Visual Studio 2010, you are probably tired of following some tedious steps for every new C++ project that you create, similar to the following:1. In Solution Explorer, right-click YourProjectName, then click Properties to open the Property Pages dialog box.2. Expand Configuration Properties and then under VC++ Directories place the cursor at the beginning of the list that appears in the Include Directories text box and then specify the location of the MS MPI C header files, followed by a semicolon, e.g.C:\Program Files\Microsoft HPC Pack 2008 SDK\Include;3. Still under Configuration Properties and under VC++ Directories place the cursor at the beginning of the list that appears in the Library Directories text box and then specify the location of the Microsoft HPC Pack 2008 SDK library file, followed by a semicolon, e.g.if you want to build/debug 32bit application:C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\i386;if you want to build/debug 64bit application:C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\amd64;4. Under Configuration Properties and then under Linker, select Input and place the cursor at the beginning of the list that appears in the Additional Dependencies text box and then type the name of the MS MPI library, i.e.msmpi.lib;5. In the code file#include "mpi.h"6. To debug the MPI project you have just setup, under Configuration Properties select Debugging and then switch the Debugger to launch combo value from Local Windows Debugger to MPI Cluster Debugger.Wouldn't it be great if at C++ project creation time you could choose an MPI Project Template that included the steps/configurations above? If you answered "yes", I have good news for you courtesy of a developer on our team (Qing). Feel free to download from Visual Studio gallery the MPI Project Template. Comments about this post welcome at the original blog.

Read the article
Attend my GTC sessions

- by Daniel Moth

The last GTC conference in the US was 2 years ago and I was there as an attendee. You may recall from that blog post that we were running UX studies at the time. It really feels great 2 years later, to be back at GTC presenting two sessions on C++ AMP, demonstrating our progress that includes input from those early studies. If you are attending GTC (in person or virtually later), be sure to watch my two-part session. Part 1 is S0242 on Wednesday 5pm and part 2 is S0244 on Thursday 10am. Here is the shared abstract for both parts: Harnessing GPU Compute with C++ AMP C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11 Beta. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release. See you there! Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Message Passing Interface (MPI)

So you have installed your cluster and you are done with introductory material on Windows HPC. Now you want to develop an application with the most common programming model: Message Passing Interface.The MPI programming model is a standard with implementations from many vendors. For newbies (like myself!), I have aggregated below links for getting started.Non-Microsoft MPI resources (useful even if you are not on the Windows platform)1. Message Passing Interface on wikipedia. 2. The MPI standard.3. MPICH2 - an MPI implementation.4. Tutorial on MPI by William Gropp.5. MPI patterns presented as a tutorial with sample code. 6. THE official MPI Forum (maintains the standard) including the wiki discussing the MPI future.7. Great MPI tutorial including at the end the MPI Exercise.8. C++ MPI Exercises by John Burkardt.9. Book online: MPI The Complete Reference.MS-MPI10. Windows HPC Server 2008 - Using MS-MPI whitepaper (15 page doc).11. Tracing MPI applications (27 page doc).12. Using Microsoft MPI (TechNet section).13. Windows HPC Server MPI forum (for posting questions). MPI.NET14. MPI.NET Home Page (not owned by Microsoft).15. MPI.NET Tutorial.16. HPC Development using F# using MPI.NET (38 page doc).Next time I'll post resources for the Microsoft Cluster SOA programming model - happy coding... Comments about this post welcome at the original blog.

Read the article
concurrency::accelerator

- by Daniel Moth

Overview An accelerator represents a "target" on which C++ AMP code can execute and where data can reside. Typically (but not necessarily) an accelerator is a GPU device. Accelerators are represented in C++ AMP as objects of the accelerator class. For many scenarios, you do not need to obtain an accelerator object, since the runtime has a notion of a default accelerator, which is what it thinks is the best one in the system. Examples where you need to deal with accelerator objects are if you need to pick your own accelerator (based on your specific criteria), or if you need to use more than one accelerators from your app. Construction and operator usage You can query and obtain a std::vector of all the accelerators on your system, which the runtime discovers on startup. Beyond enumerating accelerators, you can also create one directly by passing to the constructor a system-wide unique path to a device if you know it (i.e. the “Device Instance Path” property for the device in Device Manager), e.g. accelerator acc(L"PCI\\VEN_1002&DEV_6898&SUBSYS_0B001002etc"); There are some predefined strings (for predefined accelerators) that you can pass to the accelerator constructor (and there are corresponding constants for those on the accelerator class itself, so you don’t have to hardcode them every time). Examples are the following: accelerator::default_accelerator represents the default accelerator that the C++ AMP runtime picks for you if you don’t pick one (the heuristics of how it picks one will be covered in a future post). Example: accelerator acc; accelerator::direct3d_ref represents the reference rasterizer emulator that simulates a direct3d device on the CPU (in a very slow manner). This emulator is available on systems with Visual Studio installed and is useful for debugging. More on debugging in general in future posts. Example: accelerator acc(accelerator::direct3d_ref); accelerator::direct3d_warp represents a target that I will cover in future blog posts. Example: accelerator acc(accelerator::direct3d_warp); accelerator::cpu_accelerator represents the CPU. In this first release the only use of this accelerator is for using the staging arrays technique that I'll cover separately. Example: accelerator acc(accelerator::cpu_accelerator); You can also create an accelerator by shallow copying another accelerator instance (via the corresponding constructor) or simply assigning it to another accelerator instance (via the operator overloading of =). Speaking of operator overloading, you can also compare (for equality and inequality) two accelerator objects between them to determine if they refer to the same underlying device. Querying accelerator characteristics Given an accelerator object, you can access its description, version, device path, size of dedicated memory in KB, whether it is some kind of emulator, whether it has a display attached, whether it supports double precision, and whether it was created with the debugging layer enabled for extensive error reporting. Below is example code that accesses some of the properties; in your real code you'd probably be checking one or more of them in order to pick an accelerator (or check that the default one is good enough for your specific workload): void inspect_accelerator(concurrency::accelerator acc) { std::wcout << "New accelerator: " << acc.description << std::endl; std::wcout << "is_debug = " << acc.is_debug << std::endl; std::wcout << "is_emulated = " << acc.is_emulated << std::endl; std::wcout << "dedicated_memory = " << acc.dedicated_memory << std::endl; std::wcout << "device_path = " << acc.device_path << std::endl; std::wcout << "has_display = " << acc.has_display << std::endl; std::wcout << "version = " << (acc.version >> 16) << '.' << (acc.version & 0xFFFF) << std::endl; } accelerator_view In my next blog post I'll cover a related class: accelerator_view. Suffice to say here that each accelerator may have from 1..n related accelerator_view objects. You can get the accelerator_view from an accelerator via the default_view property, or create new ones by invoking the create_view method that creates an accelerator_view object for you (by also accepting a queuing_mode enum value of deferred or immediate that we'll also explore in the next blog post). Comments about this post by Daniel Moth welcome at the original blog.

Read the article
PPL and TPL sessions on channel9

- by Daniel Moth

Back in June there was an internal conference in Redmond ("Engineering Forum") aimed at Microsoft engineers, and delivered by Microsoft engineers. I was asked to put together a track on Multi-Core development, so I picked 6 parallelism experts and we created 6 awesome sessions (we won the top spot in the Top 10 :-)). Two of the speakers kept the content fairly external-friendly, so we received permission to publish their recordings publicly. Enjoy (best to download the High Quality WMV): Don McCrady - Parallelism in C++ Using the Concurrency Runtime Stephen Toub - Implementing Parallel Patterns using .NET 4 To get notified on future videos on parallelism (or to browse the archive) stay tuned on this channel9 parallel computing feed. Comments about this post welcome at the original blog.

Read the article
DirectCompute Lectures

- by Daniel Moth

Previously I shared resources to get you started with DirectCompute, for taking advantage of GPGPUs in an a way that doesn't tie you to a hardware vendor (e.g. nvidia, amd). I just stumbled upon and had to share a lecture series on channel9 on DirectCompute! Here are direct links to the episodes that are up there now: DirectCompute Expert Roundtable Discussion DirectCompute Lecture Series 101- Introduction to DirectCompute DirectCompute Lecture Series 110- Memory Patterns DirectCompute Lecture Series 120- Basics of DirectCompute Application Development DirectCompute Lecture Series 210- GPU Optimizations and Performance DirectCompute Lecture Series 230- GPU Accelerated Physics DirectCompute Lecture Series 250- Integration with the Graphics Pipeline Having watched these I recommend them all, but if you only want to watch a few, I suggest #2, #3, #4 and #5. Also, you should download the "WMV (High)" so you can see the code clearly and be able to Ctrl+Shift+G for fast playback… TIP: To subscribe to channel9 GPU content, use this RSS feed. Comments about this post welcome at the original blog.

Read the article

Search Results

Search found 57 results on 3 pages for 'parallelcomputing'.

Page 1/3 | 1 2 3 | Next Page >

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

1 2 3 | Next Page >