How to trigger a jar working on Hadoop from a simple jar, so that it uses HDFS, Actully, I am manually running this command bin/hadoop jar ~/wordcount_classes/word.jar org.myorg.WordCount ~/hadoop-0.20.203.0/input1 ~/hadoop-0.20.203/output2 in which I have provided Input and Output directory in HDFS and I am using word.jar here, I want to make it such that it automatically gets triggered from Java Project.
-
Can you explain better what is that you're trying to achieve?inquire– inquire2012-02-16 12:56:04 +00:00Commented Feb 16, 2012 at 12:56
-
I am trying to run a Mapreducer job such that it gets triggered by java class on an action and began after dumping of input set into input directory of HDFSVardan Gupta– Vardan Gupta2012-02-16 15:56:20 +00:00Commented Feb 16, 2012 at 15:56
Add a comment
|
2 Answers
I'm working on the same problem. I have a program (let's call it Driver) that must implement the following method:
public void runJar(File jar, String mainClass, File inputDir, File outputDir);
To do this, I was calling org.apache.hadoop.util.RunJar.main(String[]) which is what your command-line is calling. This works great only if you're running Driver from the command line.
If Driver is running inside a container (like Tomcat or Jetty), you're going to have a problem. You'll get errors like
java.lang.ClassNotFoundException: org.apache.hadoop.fs.Path
This is because of how RunJar messes with classloaders. You need to manually create a classloader like so:
final ClassLoader original = Thread.currentThread().getContextClassLoader();
try {
URL[] urls = new URL[] { jar.toURI().toURL() };
ClassLoader loader = new URLClassLoader(urls, originalLoader);
Thread.currentThread().setContextClassLoader(loader);
Class<?> mainClass = Class.forName(driverClass, true, loader);
Class[] argTypes = new Class[]{ Array.newInstance(String.class, 0).getClass()};
Method main = mainClass.getMethod("main", argTypes);
main.invoke(null, new Object[] { args });
} finally {
Thread.currentThread().setContextClassLoader(original);
}