0

There is a program called "cufflinks" which is run as follows:

cufflinks -o <output-dir>  <input-file>

This program takes 1 file as input and generates 4 files as output in the "output-dir".

I am trying to run the same program on a Hadoop cluster using Runtime.exec() in a mapper class. I am setting

output-dir=/some/path/on/HDFS

I was expecting that the 4 files will be generated on HDFS as o/p. However, that is not true and the o/p directory on HDFS does not contain any of these 4 files.

I then tried setting

output-dir=/tmp/output/

and it worked.

Can anyone please suggest why it does not work on HDFS? What do I need to do to make it work on HDFS?

Thanks.

1 Answer 1

4

The problem is that cufflinks program should use HDFS API internal to create a file in HDFS and not regular file operations.

Sign up to request clarification or add additional context in comments.

2 Comments

Hmm, and that is the problem. However, my impression was that once you run a program using Runtime.exec() on Hadoop, Hadoop framework will treat HDFS just like a regular file systems and will create the files on HDFS as it does on regular file systems. So, there is no option to do it?
HDFS is a distributed file system while NTFS/FAT/EXT* are not and each have a different API to interact with them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.