How to get output from python2 subprocess which run a script using multiprocessing?

Question

Here is my demo code. It contains two scripts.

The first is main.py, it will call print_line.py with subprocess module.

The second is print_line.py, it prints something to the stdout.

main.py

import subprocess

p = subprocess.Popen('python2 print_line.py',
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     close_fds=True,
                     shell=True,
                     universal_newlines=True)

while True:
    line = p.stdout.readline()
    if line:
        print(line)
    else:
        break

print_line.py

from multiprocessing import Process, JoinableQueue, current_process


if __name__ == '__main__':
    task_q = JoinableQueue()

    def do_task():
        while True:
            task = task_q.get()
            pid = current_process().pid
            print 'pid: {}, task: {}'.format(pid, task)
            task_q.task_done()

    for _ in range(10):
        p = Process(target=do_task)
        p.daemon = True
        p.start()

    for i in range(100):
        task_q.put(i)

    task_q.join()

Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.

So, here is my question:

Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?

Hannu · Accepted Answer · 2017-12-04 08:52:57Z

2

Curious.

In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.

You have two workarounds:

One is to use flush() in your print_line.py:

def do_task():
    while True:
        task = task_q.get()
        pid = current_process().pid
        print 'pid: {}, task: {}'.format(pid, task)
        sys.stdout.flush()
        task_q.task_done()

This will fix the issue as you will flush your stdout as soon as you have written something to it.

Another option is to use -u flag to Python in your main.py:

p = subprocess.Popen('python2 -u print_line.py',
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     close_fds=True,
                     shell=True,
                     universal_newlines=True)

-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.

These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.

answered Dec 4, 2017 at 8:52

Hannu

12.4k4 gold badges38 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Howard Zhu Over a year ago

Thank you, your solution works. I never thought that buffer would cause this problem.

torek · Accepted Answer · 2017-12-04 19:18:24Z

It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):

>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True

There is a shortcut available to you once the stream is open:

>>> sys.stdout.isatty()
True

but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.¹

When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.

A line buffered stream gets an automatic flush() applied when you write a newline to it.

An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.

When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)

¹There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.

Collectives™ on Stack Overflow

How to get output from python2 subprocess which run a script using multiprocessing?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related