0

In shell script, we have the following command:

/script1.pl < input_file| /script2.pl > output_file

I would like to replicate the above stream in Python using the module subprocess. input_file is a large file, and I can't read the whole file at once. As such I would like to pass each line, an input_string into the pipe stream and return a string variable output_string, until the whole file has been streamed through.

The following is a first attempt:

process = subprocess.Popen(["/script1.pl | /script2.pl"], stdin = subprocess.PIPE, stdout = subprocess.PIPE, shell = True)
process.stdin.write(input_string)
output_string = process.communicate()[0]

However, using process.communicate()[0] closes the stream. I would like to keep the stream open for future streams. I have tried using process.stdout.readline(), instead, but the program hangs.

6
  • /script1.pl < input_string reads the file named input_string, it does not feed the literal string input_string as input. Commented Dec 26, 2013 at 18:38
  • Ah I see. I would like to feed an actual string to my python implementation though. I will iterate through strings using a generator, and I want to pass the generated strings through the pipe on the fly. Commented Dec 26, 2013 at 18:42
  • your shell command is not compatible with "keep the stream open". What do you want to put into output_string (the first byte, the first line, the first n bytes, the first bytes that arrive in 10 seconds)? btw, output_string = process.communicate(input_string)[0] reproduces your shell command (if we use strings instead of files). Commented Dec 26, 2013 at 19:07
  • My apologies for the confusion. My shell command reads from a large file with a lot of lines, and writes to another file. I can't open and read the whole file in python. Rather, I have to read line by line, and pass each line into the pipe stream. I would like to keep the pipe stream open until all lines are passed through it. Commented Dec 26, 2013 at 19:23
  • Edited my question to clarify the problem. Thanks. Commented Dec 26, 2013 at 19:25

1 Answer 1

1

To emulate /script1.pl < input_file | /script2.pl > output_file shell command using subprocess module in Python:

#!/usr/bin/env python
from subprocess import check_call

with open('input_file', 'rb') as input_file
    with open('output_file', 'wb') as output_file:
        check_call("/script1.pl | /script2.pl", shell=True,
                   stdin=input_file, stdout=output_file)

You could write it without shell=True (though I don't see a reason here) based on 17.1.4.2. Replacing shell pipeline example from the docs:

#!/usr/bin/env python
from subprocess import Popen, PIPE

with open('input_file', 'rb') as input_file
    script1 = Popen("/script1.pl", stdin=input_file, stdout=PIPE)
with open("output_file", "wb") as output_file:
    script2 = Popen("/script2.pl", stdin=script1.stdout, stdout=output_file)
script1.stdout.close() # allow script1 to receive SIGPIPE if script2 exits
script2.wait()
script1.wait()

You could also use plumbum module to get shell-like syntax in Python:

#!/usr/bin/env python
from plumbum import local

script1, script2 = local["/script1.pl"], local["/script2.pl"]
(script1 < "input_file" | script2 > "output_file")()

See also How do I use subprocess.Popen to connect multiple processes by pipes?


If you want to read/write line by line then the answer depends on the concrete scripts that you want to run. In general it is easy to deadlock sending/receiving input/output if you are not careful e.g., due to buffering issues.

If input doesn't depend on output in your case then a reliable cross-platform approach is to use a separate thread for each stream:

#!/usr/bin/env python
from subprocess import Popen, PIPE
from threading import Thread

def pump_input(pipe):
    try:
       for i in xrange(1000000000): # generate large input
           print >>pipe, i
    finally:
       pipe.close()

p = Popen("/script1.pl | /script2.pl", shell=True, stdin=PIPE, stdout=PIPE,
          bufsize=1)
Thread(target=pump_input, args=[p.stdin]).start()
try: # read output line by line as soon as the child flushes its stdout buffer
    for line in iter(p.stdout.readline, b''):
        print line.strip()[::-1] # print reversed lines
finally:
    p.stdout.close()
    p.wait()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.