hadoop - Hadoop jar input path issue

Question

The issue I'm having is that the hadoop jar command requires an input path, but my MapReduce job gets its input from a database and hence doesn't need/have an input directory. I've set the JobConf inputformat to DBInputFormat, but how do I signify this when jarring my job?

//Here is the command
hadoop jar <my-jar> <hdfs input> <hdfs output>

I have an output folder, but don't need an input folder. Is there a way to circumvent this? Do I need to write a second program that pulls the DB data into a folder and then use that in the MapReduce job?

jeff · Accepted Answer · 2013-10-07 22:14:15Z

5

The hadoop jar command requires no command line arguments, other than maybe the main class. The command line arguments for your map/reduce job will be decided by the program itself. So if it no longer requires an HDFS input path, then you would need to change the code to not require that.

public class MyJob extends Configured implements Tool
{
   public void run(String[] args) throws Exception {
     // ...
     TextInputFormat.setInputPaths(job, new Path(args[0])); // or some other file input format
     TextOutputFormat.setOutputPath(job, new Path(args[1]));
   }
}

So you would remove the input path statement. There is no magic in JAR'ing the job up, just change the InputFormat (which you said you did) and you should be set.

answered Oct 7, 2013 at 22:14

jeff

4,33319 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

hadoop - Hadoop jar input path issue

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related