What's the most trivial function that would benfit from being computed on a GPU?

Posted by hanDerPeder on Stack Overflow See other posts from Stack Overflow or by hanDerPeder
Published on 2010-03-14T19:19:20Z Indexed on 2010/03/14 19:25 UTC
Read the original article Hit count: 226

Filed under:

opencl

|

cuda

|

GPGPU

Hi.

I'm just starting out learning OpenCL. I'm trying to get a feel for what performance gains to expect when moving functions/algorithms to the GPU.

The most basic kernel given in most tutorials is a kernel that takes two arrays of numbers and sums the value at the corresponding indexes and adds them to a third array, like so:

__kernel void 
add(__global float *a,
    __global float *b,
    __global float *answer)
{
    int gid = get_global_id(0);
    answer[gid] = a[gid] + b[gid];
}

__kernel void
sub(__global float* n,
    __global float* answer)
{
    int gid = get_global_id(0);
    answer[gid] = n[gid] - 2;
}

__kernel void
ranksort(__global const float *a,
         __global float *answer)
{
  int gid = get_global_id(0);
  int gSize = get_global_size(0);
  int x = 0;
  for(int i = 0; i < gSize; i++){
    if(a[gid] > a[i]) x++;
  }
  answer[x] = a[gid];
}

I am assuming that you could never justify computing this on the GPU, the memory transfer would out weight the time it would take computing this on the CPU by magnitudes (I might be wrong about this, hence this question).

What I am wondering is what would be the most trivial example where you would expect significant speedup when using a OpenCL kernel instead of the CPU?

© Stack Overflow or respective owner

Related posts about opencl

QtOpenCl make errors. Please help.

as seen on Stack Overflow - Search for 'Stack Overflow'
So I downloaded the ATI Stream SDK. I don't have a gpu now so I use the '-device cpu' and got the programs/examples in the OpenCl directory working by adding the directory to LD_LIBRARY_PATH etc. Now the problem is when installing QtOpenCl. configure script gives me: skkard@skkard-desktop:~/Applications/qt-labs-opencl$… >>> More
How do I use local memory in OpenCL?

as seen on Stack Overflow - Search for 'Stack Overflow'
I've been playing with OpenCL recently, and I'm able to write simple kernels that use only global memory. Now I'd like to start using local memory, but I can't seem to figure out how to use get_local_size() and get_local_id() to compute one "chunk" of output at a time. For example, let's say I wanted… >>> More
Linux QT OpenCL basic setup

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, what's the basic setup for Linux to compilie a C/C++ examples from OpenCL SDK? >>> More
solve a classic map-reduce problem with opencl?

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to parallel a classic map-reduce problem (which can parallel well with MPI) with OpenCL, namely, the AMD implementation. But the result bothers me. Let me brief about the problem first. There are two type of data that flow into the system: the feature set (30 parameters for each) and… >>> More
Custom types in OpenCL kernel

as seen on Stack Overflow - Search for 'Stack Overflow'
Is it possible to use custom types in OpenCL kernel like gmp types (mpz_t, mpq_t, …) ? To have something like that (this kernel doesn't build just because of #include <gmp.h>) : #include <gmp.h> __kernel square( __global mpz_t* input, __global mpz_t number, __global int* output… >>> More

Related posts about cuda

CUDA Driver API vs. CUDA runtime

as seen on Stack Overflow - Search for 'Stack Overflow'
When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math): I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased… >>> More
Updating a Cuda 4.0 project to Cuda 4.2

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a VS2010 project that was tested with CUDA 4.0, today I installed CUDA 4.2 and I want to update this project, the problem is that when I try to run the project it asks me for cudart32_40_17.dll, but since this is CUDA 4.2 I only have on my folders (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4… >>> More
How to solve CUDA crash when run CUDA example fluidsGL?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I use ubuntu 12.04 64 bits with GTX560Ti. I install CUDA by following instruction: wget http: //developer.download.nvidia.com/compute/cuda/4_2/rel/toolkit/cudatoolkit_4.2.9_lin ux_64_ubuntu11.04.run wget http: //developer.download.nvidia.com/compute/cuda/4_2/rel/drivers/devdriver_4… >>> More
Context migration in CUDA.NET

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm currently using CUDA.NET library by GASS. I need to initialize cuda arrays (actually cublas vectors, but it doesn't matters) in one CPU thread and use them in other CPU thread. But CUDA context which holding all initialized arrays and loaded functions, can be attached to only one CPU thread. There… >>> More
CUDA on GeForce 8600GT

as seen on Super User - Search for 'Super User'
I have got the cuda driver, toolkit and sdk installed in Ubuntu 10.04. I'm using nVidia Geforce 8600 GT card. Official website says my card is CUDA supported. But on running the deviceQuery that comes with the cuda sdk, I'm getting the following output. ./deviceQuery Starting... CUDA Device Query… >>> More