Stream tar.gz file from FTP server
- by linker
Here is the situation: I have a tar.gz file on a FTP server which can contain an arbitrary number of files.
Now what I'm trying to accomplish is have this file streamed and uploaded to HDFS through a Hadoop job. The fact that it's Hadoop is not important, in the end what I need to do is write some shell script that would take this file form ftp with wget and write the output to a stream.
The reason why I really need to use streams is that there will be a large number of these files, and each file will be huge.
It's fairly easy to do if I have a gzipped file and I'm doing something like this:
wget -O - "ftp://${user}:${pass}@${host}/$file" | zcat
But I'm not even sure if this is possible for a tar.gz file, especially since there are mutliple files in the archive. I'm a bit confused on what direction to take for this, any help would be greatly appreciated.