Some say that the equivalent of "hello world" code in the data parallel world is matrix multiplication :) Below is the before C++ AMP and after C++ AMP code. For more on what it all means, watch the recording of my C++ AMP introduction (the example below is part of the session). void MatrixMultiply(vector<float>& vC,
const vector<float>& vA,
const vector<float>& vB,
int M, int N, int W )
{
for (int y = 0; y < M; y++)
{
for (int x = 0; x < N; x++)
{
float sum = 0;
for(int i = 0; i < W; i++)
{
sum += vA[y * W + i] * vB[i * N + x];
}
vC[y * N + x] = sum;
}
}
}
Change the function to use C++ AMP and hence offload the computation to the GPU, and now the calling code (which I am not showing) needs no changes and the overall operation gives you really nice speed up for large datasets… #include <amp.h>
using namespace concurrency;
void MatrixMultiply(vector<float>& vC,
const vector<float>& vA,
const vector<float>& vB,
int M, int N, int W )
{
array_view<const float,2> a(M, W, vA);
array_view<const float,2> b(W, N, vB);
array_view<writeonly<float>,2> c(M, N, vC);
parallel_for_each(
c.grid,
[=](index<2> idx) mutable restrict(direct3d)
{
float sum = 0;
for(int i = 0; i < a.x; i++)
{
sum += a(idx.y, i) * b(i, idx.x);
}
c[idx] = sum;
}
);
}
Again, you can understand the elements above, by using my C++ AMP presentation slides and recording…
Stay tuned for more…
Comments about this post welcome at the original blog.