Very basic question about Hadoop and compressed input files

Posted by Luis Sisamon on Stack Overflow See other posts from Stack Overflow or by Luis Sisamon
Published on 2010-01-16T20:42:17Z Indexed on 2010/04/11 14:03 UTC
Read the original article Hit count: 339

Filed under:

hadoop

|

compression

I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file is compressed then the file could not be split and wold need to be processed by a single node (effectively destroying the advantage of running a mapreduce ver a cluster of parallel machines).

My question is, assuming the above is correct, is it possible to split a large file manually in fixed-size chunks, or daily chunks, compress them and then pass a list of compressed input files to perform a mapreduce?

© Stack Overflow or respective owner

Related posts about hadoop

prerequisites of learnig hadoop, can php developer learn hadoop without java experience [closed]

as seen on Programmers - Search for 'Programmers'
i am willing to learn hadoop as a Developer , but i am confused over the prerequisite of learning it.? is having a good experience in java programming very essential to learn hadoop? I have 4 years of experience in application development in LAMP. But i am not in touch with java programming as a part… >>> More
Hadoop hdfs namenode is throwing an error

as seen on Server Fault - Search for 'Server Fault'
Full list of error: hb@localhost:/etc/hadoop/conf$ sudo service hadoop-hdfs-namenode start * Starting Hadoop namenode: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-localhost.out 12/09/10 14:41:09 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG:… >>> More
Combining HBase and HDFS results in Exception in makeDirOnFileSystem

as seen on Server Fault - Search for 'Server Fault'
Introduction An attempt to combine HBase and HDFS results in the following: 2014-06-09 00:15:14,777 WARN org.apache.hadoop.hbase.HBaseFileSystem: Create Dir ectory, retries exhausted 2014-06-09 00:15:14,780 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java… >>> More
Problem compiling hive with ant

as seen on Stack Overflow - Search for 'Stack Overflow'
I compiling with Solaris 10 SPARC, jdk 1.6 from Sun, Ant 1.7.1 from OpenCSW. I have no problem running hadoop 0.17.2.1 However, I have problem compiling/integrating hive with the error 'cannot find symbol', although I followed the tutorial. I have the hive source code from SVN exactly from tutorial… >>> More
no namenode error in pseudo-mode

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm new to hadoop and is in learning phase. As per Hadoop Definitve guide, i have set up my hadoop in pseudo distributed mode and everything was working fine. I was even able to execute all the examples from chapter 3 yesterday. Today, when i rebooted my unix and tried to run start-dfs.sh and then… >>> More

Related posts about compression

Converting linear colors to SRGB shows banding in FFmpeg

as seen on Super User - Search for 'Super User'
When I convert an EXR file sequence with x264 using FFmpeg and convert the colorspace from linear to SRGB (with gamma 0.45454545) I get some heavy banding issues (most visible on a dark gradient). Here is the ffmpeg command I use: C:/ffmpeg.exe -y -i C:/seq_v001.%04d.exr -vf lutrgb=r=gammaval(0… >>> More
IIS7 Compression CSS files only compressed when dynamic compression is enabled

as seen on Server Fault - Search for 'Server Fault'
If anyone can help it would be appreciated. I would like to enable compression for static files within IIS7 (for the sake of simplicity I'll just refer to static css files for the time being). The problem I'm getting is that css files are only compressed when both dynamic and static compression… >>> More
Built-in GZip/Deflate Compression on IIS 7.x

as seen on West-Wind - Search for 'West-Wind'
IIS 7 improves internal compression functionality dramatically making it much easier than previous versions to take advantage of compression that’s built-in to the Web server. IIS 7 also supports dynamic compression which allows automatic compression of content created in your own applications (ASP… >>> More
Domino Document data compression and design compression

as seen on Server Fault - Search for 'Server Fault'
I was thinking of turning this on some large databases not just mail files - we have around 8 - 10GB of large databases as well as small databases of couple of hundred MB in size. But after reading this post I am not too sure: http://www-10.lotus.com/ldd/nd85forum.nsf/4b9931b774db788c85256bf0006b5e6d/1f4e67b569720e54852576c0003cb8ac… >>> More
I need to choose a compression algorithm

as seen on Stack Overflow - Search for 'Stack Overflow'
I need to choose a compression algorithm to compress some data. I don't know the type of data I'll be compressing in advance (think of it as kinda like the WinRAR program). I've heard of the following algorithms but I don't know which one I should use. Can anyone post a short list of pros and cons… >>> More