6

I am trying to execute NLTK in Hadoop environment. Following is the command which i used for execution.

bin/hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.0.4.jar -input /user/nltk/input/ -output /user/nltk/output1/ -file /home/hduser/softwares/NLTK/unsupervised_sentiment-master.zip -mapper /home/hduser/softwares/NLTK/unsupervised_sentiment-master/sentiment.py

unsupervised_sentiment-master.zip --- contains all the dependent files required for sentiment.py

I am getting

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

Any help would be greatly appreciated!!!

1
  • You should check the logs using the jobtracker in the HadoopUI they will help you out Commented Jul 15, 2013 at 16:44

4 Answers 4

10

Could you pls post the python files? My guess is that, you need to add #!/usr/bin/python to the top of your py file. This was the case when I was streaming using python.

Sign up to request clarification or add additional context in comments.

Comments

5

Adding below line to the top of python script made the code work for me.

#!/usr/bin/python

1 Comment

In my case #!/usr/local/bin/python was correct, one may also check this out.
2

In your sentiment.py file, add the following line to the top:

```

!/usr/bin/env python

```

This worked for me.

Comments

1

I can't say exactly what your error is, but mine was that I had an unresolved dependency in my python script. Namely statsmodels.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.