Open Cl.I just need to convert the code to using two work items in the for loop .Currentlly it uses one

Posted by user1660282 on Stack Overflow See other posts from Stack Overflow or by user1660282
Published on 2012-10-21T10:38:31Z Indexed on 2012/10/21 11:00 UTC
Read the original article Hit count: 323

Filed under:

opencl

    spmv_csr_scalar_kernel(const int num_rows ,
const int * ptr ,
const int * indices ,
const float * data ,
const float * x,
float * y)
{
int row = get_global_id(0);
if(row < num_rows)
    {
    float dot = 0;
    int row_start = ptr[row];
    int row_end = ptr[row+1];
    for (int jj = row_start; jj < row_end; jj++)
        {  dot += data[jj] * x[indices[jj]];
        }
   y[row] += dot;
   }
}

Above is the Open Cl code for multiplying a sparse matrix in CSR format with a Column vector.It uses one global work item per for loop.Can anybody help me in using two work items in each for loop.I am new to open cl and get a lot of issues if I modify even the smallest thing.Please help me.This a part of my project.I made it this parallel but I wanna make it more parallel.Please help me if you can.plzzzz

A single work item executes the for loop from row_start to row_end.I want that this row or for loop is further divided into two parts each executed by a single work item.How do I go on accomplishing that?

This is what I could come up with but its returning the wrong output.plzz help

    __kernel void mykernel(__global int* colvector,__global int* val,__global int* result,__global int* index,__global int* rowptr,__global int* sync)
{
     __global int vals[8]={0,0,0,0,0,0,0,0};
          for(int i=0;i<4;i++)
          {
            result[i]=0;
           }
barrier(CLK_GLOBAL_MEM_FENCE);

int thread_id=get_global_id(0);

int warp_id=thread_id/2;

int lane=(thread_id)&1;

int row=warp_id;

if(row<4)
{

int row_start = rowptr[row];
int row_end = rowptr[row+1];
vals[thread_id]=0;

          for (int i = row_start+lane; i<row_end; i+=2)
          {
            vals[thread_id]+=val[i]*colvector[index[i]];
           }

vals[thread_id]+=vals[thread_id+1];

if(lane==0){
result[row] += vals[thread_id];
}
}



}

Developer IT

Open Cl.I just need to convert the code to using two work items in the for loop .Currentlly it uses one - Developer IT

Open Cl.I just need to convert the code to using two work items in the for loop .Currentlly it uses one

opencl

Related posts about opencl

QtOpenCl make errors. Please help.

How do I use local memory in OpenCL?

Linux QT OpenCL basic setup

solve a classic map-reduce problem with opencl?

Custom types in OpenCL kernel

Categories cloud