I am trying to use python in a unix style pipe. For example, in unix I can use a pipe such as:
$ samtools view -h somefile.bam | python modifyStdout.py | samtools view -bh - > processed.bam
I can do this by using a for line in sys.stdin: loop in the python script and that appears to work without problems.
However I would like to internalise this unix command into a python script. The files involved will be large so I would like to avoid blocking behaviour, and basically stream between processes.
At the moment I am trying to use Popen to manage each command, and pass the stdout of the first process to the stdin of the next process, and so on.
In a seperate python script I have (sep_process.py):
import sys
f = open("sentlines.txt", 'wr')
f.write("hi")
for line in sys.stdin:
print line
f.write(line)
f.close()
And in my main python script I have this:
import sys
from subprocess import Popen, PIPE
# Generate an example file to use
f = open('sees.txt', 'w')
f.write('somewhere over the\nrainbow')
f.close()
if __name__ == "__main__":
# Use grep as an example command
p1 = Popen("grep over sees.txt".split(), stdout=PIPE)
# Send to sep_process.py
p2 = Popen("python ~/Documents/Pythonstuff/Bam_count_tags/sep_process.py".split(), stdin=p1.stdout, stdout=PIPE)
# Send to final command
p3 = Popen("wc", stdin=p2.stdout, stdout=PIPE)
# Read output from wc
result = p3.stdout.read()
print result
The p2 process however fails [Errno 2] No such file or directory even though the file exists.
Do I need to implement a Queue of some kind and/or open the python function using the multiprocessing module?