-1

I have googled this error on each and every forum but no luck. I have got the error written below:

18/08/29 00:24:53 INFO mapreduce.Job:  map 0% reduce 0%
18/08/29 00:24:59 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)


18/08/29 00:25:45 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

18/08/29 00:25:52 INFO mapreduce.Job:  map 100% reduce 100%
18/08/29 00:25:53 INFO mapreduce.Job: Job job_1535105716146_0226 failed with state FAILED due to: Task failed task_1535105716146_0226_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0


18/08/29 00:25:53 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

I have also tried my map-reduce code with the help of python standalone command

cat student1.txt | python mapper.py | python reducer.py

The code works perfectly fine. But when I tried it through Hadoop Streaming then it repeatedly throws the above error. My input file size is 3KB. I have tried the running Hadoop-streaming command also after changing the python version but no luck! I have also added #!/usr/bin/python command on the top of script. The directory has nothing inside. I also tried different versions of command:

version 1:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper mapper.py -file /home/reducer.py -reducer reducer.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

version 2: python commands with single quotes as well as double quotes

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper "python mapper.py" -file /home/reducer.py -reducer "python reducer.py" -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

Simple word-count program is running successfully on the environment, generates correct output also but when I added mysql.connector service in the python script then Hadoop-streaming reports this error. I have also studied the job logs but no such information found.

2

3 Answers 3

2

If your issue is not about python libraries or code problem, it might be about python file comments (first lines) and your OS.

For me, on MAC OS, after installing locally HADOOP with this tutorial : tuto Python mapper/reducer didn't execute well. Errors : java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 or java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127

My configuration :

  • I use HADOOP 3.2.1_1
  • with Python 3.7.6,
  • on macOS Mojave 10.14.6
  • I have installed JAVA version of tutorial (adoptopenjdk8) : "1.8.0_252"

To launch your job with python, I use new command : mapred streaming instead of hadoop jar /xxx/hadoop-mapreduce/hadoop-streaming-xxx.jar form Hadoop documentation (be careful, I think this doc is not good about examples with generic options (deprecated: -file, new: -files)

I found two possibilities :

  1. Keep python file untouched with first line : # -*-coding:utf-8 -*

Only this command works for me :

mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper "python WordCountMapper.py" \
-reducer "python WordCountReducer.py"

assuming that I want to count words of /data/input/README.TXT already copied in my HDFS volume (hadoop fs -copyFromLocal /absolute-local-folder/data/input/README.TXT /data/input), with local Python files WordCountMapper.py & WordCountReducer.py

Code for WordCountMapper.py :

#!/usr/bin/python
# -*-coding:utf-8 -*
import sys

for line in sys.stdin:
    # Supprimer les espaces
    line = line.strip()
    # recupérer les mots
    words = line.split()

    # operation map, pour chaque mot, generer la paire (mot, 1)
    for word in words:
        print("%s\t%d" % (word, 1))

Code for WordCountReducer.py :

#!/usr/bin/python
# -*-coding:utf-8 -*

import sys
total = 0
lastword = None

for line in sys.stdin:
    line = line.strip()

    # recuperer la cle et la valeur et conversion de la valeur en int
    word, count = line.split()
    count = int(count)

    # passage au mot suivant (plusieurs cles possibles pour une même exécution de programme)
    if lastword is None:
        lastword = word
    if word == lastword:
        total += count
    else:
        print("%s\t%d occurences" % (lastword, total))
        total = count
        lastword = word

if lastword is not None:
    print("%s\t%d occurences" % (lastword, total))
  1. Edit python files for execution :

2.1. Add execution mode to python files :

chmod +x WordCountMapper.py

chmod +x WordCountReducer.py

2.2. Add have 2 lines at first :

first line :  `#!/usr/bin/python` 

second line : `# -*-coding:utf-8 -*`

Use this command :

mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper ./WordCountMapper.py \
-reducer ./WordCountReducer.py
Sign up to request clarification or add additional context in comments.

Comments

0

I checked the job error logs and placed the required python files which are not predefined libraries into the python directory. Then, enter the Hadoop streaming command with those python files:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=0 -file /home/mapper3.py -mapper mapper3.py -file /home/reducer3.py -reducer reducer3.py -file /home/ErrorHandle.py -file /home/ExceptionUtil.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

Comments

0

For me it was changing #!/usr/bin/env python to #!/usr/bin/env python3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.