Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1, worked perfectly on local

Question

I have googled this error on each and every forum but no luck. I have got the error written below:

18/08/29 00:24:53 INFO mapreduce.Job:  map 0% reduce 0%
18/08/29 00:24:59 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)


18/08/29 00:25:45 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

18/08/29 00:25:52 INFO mapreduce.Job:  map 100% reduce 100%
18/08/29 00:25:53 INFO mapreduce.Job: Job job_1535105716146_0226 failed with state FAILED due to: Task failed task_1535105716146_0226_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0


18/08/29 00:25:53 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

I have also tried my map-reduce code with the help of python standalone command

cat student1.txt | python mapper.py | python reducer.py

The code works perfectly fine. But when I tried it through Hadoop Streaming then it repeatedly throws the above error. My input file size is 3KB. I have tried the running Hadoop-streaming command also after changing the python version but no luck! I have also added #!/usr/bin/python command on the top of script. The directory has nothing inside. I also tried different versions of command:

version 1:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper mapper.py -file /home/reducer.py -reducer reducer.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

version 2: python commands with single quotes as well as double quotes

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper "python mapper.py" -file /home/reducer.py -reducer "python reducer.py" -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

Simple word-count program is running successfully on the environment, generates correct output also but when I added mysql.connector service in the python script then Hadoop-streaming reports this error. I have also studied the job logs but no such information found.

I have also given chmod 777 to each and every file so that it could be run but no luck! — anshita
– anshita, Commented Aug 29, 2018 at 11:44
Please read Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers? - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. — halfer
– halfer, Commented Aug 29, 2018 at 13:45

jeugregg · Accepted Answer · 2020-05-05 23:43:02Z

If your issue is not about python libraries or code problem, it might be about python file comments (first lines) and your OS.

For me, on MAC OS, after installing locally HADOOP with this tutorial : tuto Python mapper/reducer didn't execute well. Errors : java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 or java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127

My configuration :

I use HADOOP 3.2.1_1
with Python 3.7.6,
on macOS Mojave 10.14.6
I have installed JAVA version of tutorial (adoptopenjdk8) : "1.8.0_252"

To launch your job with python, I use new command : mapred streaming instead of hadoop jar /xxx/hadoop-mapreduce/hadoop-streaming-xxx.jar form Hadoop documentation (be careful, I think this doc is not good about examples with generic options (deprecated: -file, new: -files)

I found two possibilities :

Keep python file untouched with first line : # -*-coding:utf-8 -*

Only this command works for me :

mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper "python WordCountMapper.py" \
-reducer "python WordCountReducer.py"

assuming that I want to count words of /data/input/README.TXT already copied in my HDFS volume (hadoop fs -copyFromLocal /absolute-local-folder/data/input/README.TXT /data/input), with local Python files WordCountMapper.py & WordCountReducer.py

Code for WordCountMapper.py :

#!/usr/bin/python
# -*-coding:utf-8 -*
import sys

for line in sys.stdin:
    # Supprimer les espaces
    line = line.strip()
    # recupérer les mots
    words = line.split()

    # operation map, pour chaque mot, generer la paire (mot, 1)
    for word in words:
        print("%s\t%d" % (word, 1))

Code for WordCountReducer.py :

#!/usr/bin/python
# -*-coding:utf-8 -*

import sys
total = 0
lastword = None

for line in sys.stdin:
    line = line.strip()

    # recuperer la cle et la valeur et conversion de la valeur en int
    word, count = line.split()
    count = int(count)

    # passage au mot suivant (plusieurs cles possibles pour une même exécution de programme)
    if lastword is None:
        lastword = word
    if word == lastword:
        total += count
    else:
        print("%s\t%d occurences" % (lastword, total))
        total = count
        lastword = word

if lastword is not None:
    print("%s\t%d occurences" % (lastword, total))

Edit python files for execution :

2.1. Add execution mode to python files :

chmod +x WordCountMapper.py

chmod +x WordCountReducer.py

2.2. Add have 2 lines at first :

first line :  `#!/usr/bin/python` 

second line : `# -*-coding:utf-8 -*`

Use this command :

mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper ./WordCountMapper.py \
-reducer ./WordCountReducer.py

anshita · Accepted Answer · 2018-08-31 04:59:01Z

0

I checked the job error logs and placed the required python files which are not predefined libraries into the python directory. Then, enter the Hadoop streaming command with those python files:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=0 -file /home/mapper3.py -mapper mapper3.py -file /home/reducer3.py -reducer reducer3.py -file /home/ErrorHandle.py -file /home/ExceptionUtil.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

answered Aug 31, 2018 at 4:59

anshita

11 gold badge1 silver badge1 bronze badge

Comments

jakub-adamczewski · Accepted Answer · 2023-03-18 13:16:52Z

0

For me it was changing #!/usr/bin/env python to #!/usr/bin/env python3

answered Mar 18, 2023 at 13:16

jakub-adamczewski

4135 silver badges11 bronze badges

Collectives™ on Stack Overflow

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1, worked perfectly on local

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related