Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3.

Posted by Deepak Konidena on Stack Overflow See other posts from Stack Overflow or by Deepak Konidena
Published on 2010-06-09T20:44:10Z Indexed on 2010/06/13 14:52 UTC
Read the original article Hit count: 514

Filed under:

hdfs

Hi,

I have setup a Hadoop cluster containing 5 nodes on Amazon EC2. Now, when i login into the Master node and submit the following command

bin/hadoop jar <program>.jar <arg1> <arg2> <path/to/input/file/on/S3>

It throws the following errors (not at the same time.) The first error is thrown when i don't replace the slashes with '%2F' and the second is thrown when i replace them with '%2F':

1) Java.lang.IllegalArgumentException: Invalid hostname in URI S3://<ID>:<SECRETKEY>@<BUCKET>/<path-to-inputfile>
2) org.apache.hadoop.fs.S3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/' XML Error Message: The request signature we calculated does not match the signature you provided. check your key and signing method.

Note:

1)when i submitted jps to see what tasks were running on the Master, it just showed

1116 NameNode
1699 Jps
1180 JobTracker

leaving DataNode and TaskTracker.

2)My Secret key contains two '/' (forward slashes). And i replace them with '%2F' in the S3 URI.

PS: The program runs fine on EC2 when run on a single node. Its only when i launch a cluster, i run into issues related to copying data to/from S3 from/to HDFS. And, what does distcp do? Do i need to distribute the data even after i copy the data from S3 to HDFS?(I thought, HDFS took care of that internally)

IF you could direct me to a link that explains running Map/reduce programs on a hadoop cluster using Amazon EC2/S3. That would be great.

Regards,

Deepak.

Developer IT

Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3. - Developer IT

Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3.

amazon-s3

amazon-ec2

hadoop

cloud-computing

hdfs

Related posts about amazon-s3

Amazon S3 Tips: Quickly Add/Modify HTTP Headers To All Files Recursively

Alternative to Amazon's S3 service?

Alternative to Amazon’s S3 service?

Saving Python Complex Data Types to Amazon S3

Filesize with SWFUpload and Amazon S3

Related posts about amazon-ec2

Amazon EC2 spot instances - is there a catch ?

Amazon EC2 spot instances - is there a catch ?

Amazon EC2 master node hanging

Can't Connect to IIS Ftp Site under Amazon EC2

Amazon EC2 EBS volume scheduled backup/snapshots using puppet

Categories cloud