python subprocess module hangs for spark-submit command when writing STDOUT

Question

I have a python script that is used to submit spark jobs using the spark-submit tool. I want to execute the command and write the output both to STDOUT and a logfile in real time. i'm using python 2.7 on a ubuntu server.

This is what I have so far in my SubmitJob.py script

#!/usr/bin/python

# Submit the command
def submitJob(cmd, log_file):
    with open(log_file, 'w') as fh:
        process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
        while True:
            output = process.stdout.readline()
            if output == '' and process.poll() is not None:
                break
            if output:
                print output.strip()
                fh.write(output)
        rc = process.poll()
        return rc

if __name__ == "__main__":
    cmdList = ["dse", "spark-submit", "--spark-master", "spark://127.0.0.1:7077", "--class", "com.spark.myapp", "./myapp.jar"]
    log_file = "/tmp/out.log"
    exist_status = submitJob(cmdList, log_file)
    print "job finished with status ",exist_status

The strange thing is, when I execute the same command direcly in the shell it works fine and produces output on screen as the proggram proceeds.

So it looks like something is wrong in the way I'm using the subprocess.PIPE for stdout and writing the file.

What's the current recommended way to use subprocess module for writing to stdout and log file in real time line by line? I see bunch of options on the internet but not sure which is correct or latest.

thanks

Your for loop could be a bit thinner but otherwise, this should do it. I don't know spark or what it does with stdout, but that may be the better place to look. I think you should add a spark tag. And probably remove the bash tag. — tdelaney
– tdelaney, Commented Oct 13, 2016 at 17:50

user330612 · Accepted Answer · 2016-10-14 15:29:52Z

3

Figured out what the problem was. I was trying to redirect both stdout n stderr to pipe to display on screen. This seems to block the stdout when stderr is present. If I remove the stderr=stdout argument from Popen, it works fine. So for spark-submit it looks like you don't need to redirect stderr explicitly as it already does this implicitly

answered Oct 14, 2016 at 15:29

user330612

2,2398 gold badges36 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Benjamin Du Over a year ago

Do anyone have an idea whether this is a bug in spark-submit or in the Python module subprocess?

M. Barbieri Over a year ago

I believe this is because spark-submit redirects a lot of its output to stderr, so printing out to stdout will not get you the scripts actual output

Paul Velthuis · Accepted Answer · 2017-03-01 10:07:01Z

To print the Spark log One can call the commandList given by user330612

  cmdList = ["spark-submit", "--spark-master", "spark://127.0.0.1:7077", "--class", "com.spark.myapp", "./myapp.jar"]

Then it can be printed by using subprocess, remember to use communicate() to prevent deadlocks https://docs.python.org/2/library/subprocess.html Warning Deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that. Here below is the code to print the log.

import subprocess
p = subprocess.Popen(cmdList,stdout=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
stdout, stderr = p.communicate() 
stderr=stderr.splitlines()
stdout=stdout.splitlines()
for line in stderr:
    print line  #now it can be printed line by line to a file or something else, for the log
for line in stdout:
    print line #for the output

More information about subprocess and printing lines can be found at: https://pymotw.com/2/subprocess/

Collectives™ on Stack Overflow

python subprocess module hangs for spark-submit command when writing STDOUT

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related