1

I have written mapper and reducer in python for word count program that works fine. Here is a sample:

echo "hello hello world here hello here world here hello" | wordmapper.py | sort -k1,1 | wordreducer.py 
hello   4
here    3
world   2

Now when i try to submit a hadoop job for a large file, I get errors

hadoop jar share/hadoop/tools/sources/hadoop-*streaming*.jar -file wordmapper.py -mapper wordmapper.py  -file wordreducer.py -reducer wordreducer.py -input /data/1jrl.pdb -output /output/py_jrl
Exception in thread "main" java.lang.ClassNotFoundException: share.hadoop.tools.sources.hadoop-streaming-2.2.0-test-sources.jar
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:249)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

I removed changed the commandline to the following (removed wild card from above);

hadoop jar share/hadoop/tools/sources/hadoop-streaming-2.2.0-sources.jar -file wordmapper.py -mapper wordmapper.py  -file wordreducer.py -reducer wordreducer.py -input /data/1jrl.pdb -output /output/py_jrl
Exception in thread "main" java.lang.ClassNotFoundException: -file
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:249)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

why I get these errors and how to fix this? I use hadoop2. Thanks!

1 Answer 1

3

Well at least one of your issues is that you are using the -sources.jar which is just .java files and can't be executed.

Try using this instead...

share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

And if that doesn't exist, look for a hadoop-streaming*.jar that doesn't have -sources in the file name.

Sign up to request clarification or add additional context in comments.

4 Comments

This worked great. Do you know where I can look for java files (source code) for the classes in share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
Go to the Hadoop 2.2 Apache download page and download hadoop-2.2.0-src.tar.gz
sorry this could be stupid question. should I have to download entire hadoop-src tar to get the source code for examples?
You may have luck finding the 2.2 source for mapreduce examples, but I've always just downloaded the entire source so I can see everything.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.