Getting started with massive data

Posted by Max on Stack Overflow See other posts from Stack Overflow or by Max
Published on 2010-05-30T04:04:28Z Indexed on 2010/05/30 4:12 UTC
Read the original article Hit count: 360

Filed under:

data

|

nosql

|

hadoop

|

mapreduce

|

large-data-volumes

I'm a math guy and occasionally do some statistics/machine learning analysis consulting projects on the side. The data I have access to are usually on the smaller side, at most a couple hundred of megabytes (and almost always far less), but I want to learn more about handling and analyzing data on the gigabyte/terabyte scale. What do I need to know and what are some good resources to learn from?

Hadoop/MapReduce is one obvious start.
Is there a particular programming language I should pick up? (I primarily work now in Python, Ruby, R, and occasionally Java, but it seems like C and Clojure are often used for large-scale data analysis?)
I'm not really familiar with the whole NoSQL movement, except that it's associated with big data. What's a good place to learn about it, and is there a particular implementation (Cassandra, CouchDB, etc.) I should get familiar with?
Where can I learn about applying machine learning algorithms to huge amounts of data? My math background is mostly on the theory side, definitely not on the numerical or approximation side, and I'm guessing most of the standard ML algorithms don't really scale.
Any other suggestions on things to learn would be great!

© Stack Overflow or respective owner

Related posts about data

timetable in a jTable

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to create a timetable in a jTable. For the top row it will display from monday to sunday and the left colume will display the time of the day with 2h interval e.g 1st colume (0000 - 0200), 2nd colume (0200 - 0400) .... And if i click a button the timing will change from 2h interval to 1h interval… >>> More
Reading data from an Entity Framework data model through a WCF Data Service

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
This is going to be the fourth post of a series of posts regarding ASP.Net and the Entity Framework and how we can use Entity Framework to access our datastore. You can find the first one here , the second one here and the third one here . I have a post regarding ASP.Net and EntityDataSource. You… >>> More
SQL SERVER – Advanced Data Quality Services with Melissa Data – Azure Data Market

as seen on SQL Authority - Search for 'SQL Authority'
There has been much fanfare over the new SQL Server 2012, and especially around its new companion product Data Quality Services (DQS). Among the many new features is the addition of this integrated knowledge-driven product that enables data stewards everywhere to profile, match, and cleanse data.… >>> More
Modifying a HTML page to fix several "bugs" add a function to next/previous on a option dropdown

as seen on Stack Overflow - Search for 'Stack Overflow'
SOF, I've got a few problems plaguing me at the moment and am wondering if anyone could assist me with them. I'm trying to get Next Class | Previous Class to act as buttons so that when Next Class is clicked it will go to the next item in the dropdown list and for previous it would go to back one… >>> More
Shrinking TCP Window Size to 0 on Cisco ASA

as seen on Server Fault - Search for 'Server Fault'
Having an issue with any large file transfer that crosses our Cisco ASA unit come to an eventual pause. Setup Test1: Server A, FileZilla Client <- 1GBPS - Cisco ASA <- 1 GBPS - Server B, FileZilla Server TCP Window size on large transfers will drop to 0 after around 30 seconds of a large… >>> More

Related posts about nosql

???: Oracle NoSQL Database??

as seen on Oracle Blogs - Search for 'Oracle Blogs'
?????????Oracle?????Oracle NoSQL Database,?????NoSQL Database ??????????Oracle NoSQL Database??2???,Community Edition ?Enterprise Edition?????????NoSQL Database 11g R2 (11gR2.1.2.123). ?????????????????: Oracle NoSQL Database OTN portal (includes download facility) Oracle NoSQL Database… >>> More
Free NOSQL database for use with C# client [closed]

as seen on Programmers - Search for 'Programmers'
I've never used NOSQL databases before, but so far it seems like the best data storage solution for my project. I am going to implement a datamining application. The data I would like to mine is thousands of documents which cannot be imported into datamining applications. To make to import easier… >>> More
Big Data – Buzz Words: What is NoSQL – Day 5 of 21

as seen on SQL Authority - Search for 'SQL Authority'
In yesterday’s blog post we explored the basic architecture of Big Data . In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – NoSQL. What is NoSQL? NoSQL stands for Not Relational SQL or Not Only SQL. Lots of people think… >>> More
Oracle Big Data Learning Library - Click on LEARN BY PRODUCT to Open Page

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Oracle Big Data Learning Library... Learn about Oracle Big Data, Data Science, Learning Analytics, Oracle NoSQL Database, and more! … >>> More
NoSQL with RavenDB and ASP.NET MVC - Part 1

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
A while back, I have blogged NoSQL with MongoDB, NoRM and ASP.NET MVC Part 1 and Part 2 on how to use MongoDB with an ASP.NET MVC application. The NoSQL movement is getting big attention and RavenDB is the latest addition to the NoSQL and document database world. RavenDB is an Open Source (with… >>> More