Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3.

Posted by Deepak Konidena on Stack Overflow See other posts from Stack Overflow or by Deepak Konidena
Published on 2010-06-09T20:44:10Z Indexed on 2010/06/13 14:52 UTC
Read the original article Hit count: 433

Hi,

I have setup a Hadoop cluster containing 5 nodes on Amazon EC2. Now, when i login into the Master node and submit the following command

bin/hadoop jar <program>.jar <arg1> <arg2> <path/to/input/file/on/S3>

It throws the following errors (not at the same time.) The first error is thrown when i don't replace the slashes with '%2F' and the second is thrown when i replace them with '%2F':

1) Java.lang.IllegalArgumentException: Invalid hostname in URI S3://<ID>:<SECRETKEY>@<BUCKET>/<path-to-inputfile>
2) org.apache.hadoop.fs.S3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/' XML Error Message: The request signature we calculated does not match the signature you provided. check your key and signing method.

Note:

1)when i submitted jps to see what tasks were running on the Master, it just showed

1116 NameNode
1699 Jps
1180 JobTracker

leaving DataNode and TaskTracker.

2)My Secret key contains two '/' (forward slashes). And i replace them with '%2F' in the S3 URI.

PS: The program runs fine on EC2 when run on a single node. Its only when i launch a cluster, i run into issues related to copying data to/from S3 from/to HDFS. And, what does distcp do? Do i need to distribute the data even after i copy the data from S3 to HDFS?(I thought, HDFS took care of that internally)

IF you could direct me to a link that explains running Map/reduce programs on a hadoop cluster using Amazon EC2/S3. That would be great.

Regards,

Deepak.

© Stack Overflow or respective owner

Related posts about amazon-s3

Related posts about amazon-ec2