How to use Cassandra's Map Reduce with or w/o Pig?

Posted by UltimateBrent on Stack Overflow See other posts from Stack Overflow or by UltimateBrent
Published on 2010-04-29T00:17:18Z Indexed on 2010/04/29 0:27 UTC
Read the original article Hit count: 540

Filed under:

cassandra

|

mapreduce

|

pig

Can someone explain how MapReduce works with Cassandra .6? I've read through the word count example, but I don't quite follow what's happening on the Cassandra end vs. the "client" end.

https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/

For instance, let's say I'm using Python and Pycassa, how would I load in a new map reduce function, and then call it? Does my map reduce function have to be java that's installed on the cassandra server? If so, how do I call it from Pycassa?

There's also mention of Pig making this all easier, but I'm a complete Hadoop noob, so that didn't really help.

Your answer can use Thrift or whatever, I just mentioned Pycassa to denote the client side. I'm just trying to understand the difference between what runs in the Cassandra cluster vs. the actual server making the requests.

© Stack Overflow or respective owner

Related posts about cassandra

Cassandra inserts using Net::Cassandra::Easy in Perl

as seen on Stack Overflow - Search for 'Stack Overflow'
When using the Perl module Net::Cassandra::Easy to interface with Cassandra I use the following code to read colums col[123] from rows row[123] in column-family Standard1: my $cassandra = Net::Cassandra::Easy->new(keyspace => 'Keyspace1', server => 'localhost'); $cassandra->connect(); my… >>> More
How do I insert a row with a TimeUUIDType column in Cassandra?

as seen on Stack Overflow - Search for 'Stack Overflow'
In Cassandra, I have the following Column Family: <ColumnFamily CompareWith="TimeUUIDType" Name="Posts"/> I'm trying to insert a record into it as follows using a C++ generated function generated by Thrift: ColumnPath new_col; new_col.__isset.column = true; /* this is required! */ new_col… >>> More
shell script over SSH ends unexpectedly after running 'ant build'

as seen on Super User - Search for 'Super User'
I wrote a shell script that runs on remote host to build source code with 'ant build' command, and then distribute the built binary to other servers. However, right after Ant build is over successfully(I can see the command line output saying Build was successful), the ssh session ends and whatever… >>> More
Cassandra Remote Connection

as seen on Server Fault - Search for 'Server Fault'
I'm not managing to connect to cassandra from outside machines. The database is hosted on a windows machine and im trying to connect through a mac (but this shouldn't cause problems) Local connection works: C:\cassandra\bin>cassandra-cli Starting Cassandra Client Connected to: "Test Cluster"… >>> More
A catalogue of Cassandra log messages: What is the correct interpretation?

as seen on Stack Overflow - Search for 'Stack Overflow'
The following is a complete catalogue of all log messages generated by Cassandra 0.6 when stress-testing a Cassandra installation over an extended period of time: AntiEntropyService: Sending AEService tree for (,) to: [] CassandraDaemon: Binding thrift service to localhost/N.N.N.N:N CassandraDaemon:… >>> More

Related posts about mapreduce

Chaining multiple MapReduce jobs in Hadoop.

as seen on Stack Overflow - Search for 'Stack Overflow'
In many real-life situations where you apply MapReduce, the final algorithms end up being several MapReduce steps. I.e. Map1 , Reduce1 , Map2 , Reduce2 , etc. So you have the output from the last reduce that is needed as the input for the next map. The intermediate data is something you (in general)… >>> More
Error in using Hadoop MapReduce in Eclipse

as seen on Stack Overflow - Search for 'Stack Overflow'
When I executed a MapReduce program in Eclipse using Hadoop, I got the below error. It has to be some change in path, but I'm not able to figure it out. Any idea? 16:35:39 INFO mapred.JobClient: Task Id : attempt_201001151609_0001_m_000006_0, Status : FAILED java.io.FileNotFoundException: File C:/tmp/hadoop-Shwe/mapred/local/taskTracker/jobcache/job_201001151609_0001/attempt_201001151609_0001_m_000006_0/work/tmp… >>> More
Help converting java program to MapReduce job

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I would like to convert the following Java program to a MapReduce job. I have read about MapReduce and feel like this would be a good problem to solve using it, but I cannot figure out what to do. This basically loops through a directory of html files and parses them into a CSV file. http://www… >>> More
Can a webserver be implemented using mapreduce?

as seen on Stack Overflow - Search for 'Stack Overflow'
Could mapreduce be used to implement a webserver? I'm thinking something like when a request comes in then the request sits on a queue, until a server is free to process it? Or am I missing the point here? >>> More
Big Data – Buzz Words: What is MapReduce – Day 7 of 21

as seen on SQL Authority - Search for 'SQL Authority'
In yesterday’s blog post we learned what is Hadoop. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – MapReduce. What is MapReduce? MapReduce was designed by Google as a programming model for processing large data sets with… >>> More