solve a classic map-reduce problem with opencl?

Posted by liuliu on Stack Overflow See other posts from Stack Overflow or by liuliu
Published on 2010-04-01T04:27:20Z Indexed on 2010/04/01 4:33 UTC
Read the original article Hit count: 733

Filed under:

opencl

|

mapreduce

|

raytracing

I am trying to parallel a classic map-reduce problem (which can parallel well with MPI) with OpenCL, namely, the AMD implementation. But the result bothers me.

Let me brief about the problem first. There are two type of data that flow into the system: the feature set (30 parameters for each) and the sample set (9000+ dimensions for each). It is a classic map-reduce problem in the sense that I need to calculate the score of every feature on every sample (Map). And then, sum up the overall score for every feature (Reduce). There are around 10k features and 30k samples.

I tried different ways to solve the problem. First, I tried to decompose the problem by features. The problem is that the score calculation consists of random memory access (pick some of the 9000+ dimensions and do plus/subtraction calculations). Since I cannot coalesce memory access, it costs. Then, I tried to decompose the problem by samples. The problem is that to sum up overall score, all threads are competing for few score variables. It keeps overwriting the score which turns out to be incorrect. (I cannot carry out individual score first and sum up later because it requires 10k * 30k * 4 bytes).

The first method I tried gives me the same performance on i7 860 CPU with 8 threads. However, I don't think the problem is unsolvable: it is remarkably similar to ray tracing problem (for which you carry out calculation that millions of rays against millions of triangles). Any ideas?

© Stack Overflow or respective owner

Related posts about opencl

QtOpenCl make errors. Please help.

as seen on Stack Overflow - Search for 'Stack Overflow'
So I downloaded the ATI Stream SDK. I don't have a gpu now so I use the '-device cpu' and got the programs/examples in the OpenCl directory working by adding the directory to LD_LIBRARY_PATH etc. Now the problem is when installing QtOpenCl. configure script gives me: skkard@skkard-desktop:~/Applications/qt-labs-opencl$… >>> More
How do I use local memory in OpenCL?

as seen on Stack Overflow - Search for 'Stack Overflow'
I've been playing with OpenCL recently, and I'm able to write simple kernels that use only global memory. Now I'd like to start using local memory, but I can't seem to figure out how to use get_local_size() and get_local_id() to compute one "chunk" of output at a time. For example, let's say I wanted… >>> More
Linux QT OpenCL basic setup

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, what's the basic setup for Linux to compilie a C/C++ examples from OpenCL SDK? >>> More
solve a classic map-reduce problem with opencl?

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to parallel a classic map-reduce problem (which can parallel well with MPI) with OpenCL, namely, the AMD implementation. But the result bothers me. Let me brief about the problem first. There are two type of data that flow into the system: the feature set (30 parameters for each) and… >>> More
Custom types in OpenCL kernel

as seen on Stack Overflow - Search for 'Stack Overflow'
Is it possible to use custom types in OpenCL kernel like gmp types (mpz_t, mpq_t, …) ? To have something like that (this kernel doesn't build just because of #include <gmp.h>) : #include <gmp.h> __kernel square( __global mpz_t* input, __global mpz_t number, __global int* output… >>> More

Related posts about mapreduce

Chaining multiple MapReduce jobs in Hadoop.

as seen on Stack Overflow - Search for 'Stack Overflow'
In many real-life situations where you apply MapReduce, the final algorithms end up being several MapReduce steps. I.e. Map1 , Reduce1 , Map2 , Reduce2 , etc. So you have the output from the last reduce that is needed as the input for the next map. The intermediate data is something you (in general)… >>> More
Error in using Hadoop MapReduce in Eclipse

as seen on Stack Overflow - Search for 'Stack Overflow'
When I executed a MapReduce program in Eclipse using Hadoop, I got the below error. It has to be some change in path, but I'm not able to figure it out. Any idea? 16:35:39 INFO mapred.JobClient: Task Id : attempt_201001151609_0001_m_000006_0, Status : FAILED java.io.FileNotFoundException: File C:/tmp/hadoop-Shwe/mapred/local/taskTracker/jobcache/job_201001151609_0001/attempt_201001151609_0001_m_000006_0/work/tmp… >>> More
Help converting java program to MapReduce job

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I would like to convert the following Java program to a MapReduce job. I have read about MapReduce and feel like this would be a good problem to solve using it, but I cannot figure out what to do. This basically loops through a directory of html files and parses them into a CSV file. http://www… >>> More
Can a webserver be implemented using mapreduce?

as seen on Stack Overflow - Search for 'Stack Overflow'
Could mapreduce be used to implement a webserver? I'm thinking something like when a request comes in then the request sits on a queue, until a server is free to process it? Or am I missing the point here? >>> More
Big Data – Buzz Words: What is MapReduce – Day 7 of 21

as seen on SQL Authority - Search for 'SQL Authority'
In yesterday’s blog post we learned what is Hadoop. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – MapReduce. What is MapReduce? MapReduce was designed by Google as a programming model for processing large data sets with… >>> More