Running commands on Hadoop using Java Runtime.exec()

Question

There is a program called "cufflinks" which is run as follows:

cufflinks -o <output-dir>  <input-file>

This program takes 1 file as input and generates 4 files as output in the "output-dir".

I am trying to run the same program on a Hadoop cluster using Runtime.exec() in a mapper class. I am setting

output-dir=/some/path/on/HDFS

I was expecting that the 4 files will be generated on HDFS as o/p. However, that is not true and the o/p directory on HDFS does not contain any of these 4 files.

I then tried setting

output-dir=/tmp/output/

and it worked.

Can anyone please suggest why it does not work on HDFS? What do I need to do to make it work on HDFS?

Thanks.

Praveen Sripati · Accepted Answer · 2012-11-07 14:59:53Z

4

The problem is that cufflinks program should use HDFS API internal to create a file in HDFS and not regular file operations.

answered Nov 7, 2012 at 14:59

Praveen Sripati

33.7k18 gold badges85 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Piyush Kansal Over a year ago

Hmm, and that is the problem. However, my impression was that once you run a program using Runtime.exec() on Hadoop, Hadoop framework will treat HDFS just like a regular file systems and will create the files on HDFS as it does on regular file systems. So, there is no option to do it?

Praveen Sripati Over a year ago

HDFS is a distributed file system while NTFS/FAT/EXT* are not and each have a different API to interact with them.

Collectives™ on Stack Overflow

Running commands on Hadoop using Java Runtime.exec()

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related