0

I am trying to use python in a unix style pipe. For example, in unix I can use a pipe such as:

$ samtools view -h somefile.bam | python modifyStdout.py | samtools view -bh - > processed.bam

I can do this by using a for line in sys.stdin: loop in the python script and that appears to work without problems.

However I would like to internalise this unix command into a python script. The files involved will be large so I would like to avoid blocking behaviour, and basically stream between processes.

At the moment I am trying to use Popen to manage each command, and pass the stdout of the first process to the stdin of the next process, and so on.

In a seperate python script I have (sep_process.py):

import sys

f = open("sentlines.txt", 'wr')
f.write("hi")
for line in sys.stdin:
    print line
    f.write(line)
f.close()

And in my main python script I have this:

import sys
from subprocess import Popen, PIPE

# Generate an example file to use
f = open('sees.txt', 'w')
f.write('somewhere over the\nrainbow')
f.close()

if __name__ == "__main__":
    # Use grep as an example command
    p1 = Popen("grep over sees.txt".split(), stdout=PIPE)

    # Send to sep_process.py 
    p2 = Popen("python ~/Documents/Pythonstuff/Bam_count_tags/sep_process.py".split(), stdin=p1.stdout, stdout=PIPE)   

    # Send to final command
    p3 = Popen("wc", stdin=p2.stdout, stdout=PIPE)

    # Read output from wc
    result = p3.stdout.read()
    print result

The p2 process however fails [Errno 2] No such file or directory even though the file exists.

Do I need to implement a Queue of some kind and/or open the python function using the multiprocessing module?

2 Answers 2

1

The tilde ~ is a shell expansion. You are not using a shell, so it is looking for a directory called ~.

You could read the environment variable HOME and insert that. Use

os.environ['HOME']

Alternatively you could use shell=True if you can't be bothered to do your own expansion.

Sign up to request clarification or add additional context in comments.

Comments

0

Thanks @cdarke, that solved the problem for using simple commands like grep, wc etc. However I was too stupid to get subprocess.Popen to work when using an executable such as samtools to provide the data stream.

To fix the issue, I created a string containing the pipe exactly as I would write it in the command line, for example:

sam = '/Users/me/Documents/Tools/samtools-1.2/samtools'
home = os.environ['HOME']    
inpath = "{}/Documents/Pythonstuff/Bam_count_tags".format(home)

stream_in = "{s} view -h {ip}/test.bam".format(s=sam, ip=inpath)
pyscript = "python {ip}/bam_tags.py".format(ip=inpath)
stream_out = "{s} view -bh - > {ip}/small.bam".format(s=sam, ip=inpath)

# Absolute paths, witten as a pipe
fullPipe = "{inS} | {py} | {outS}".format(inS=stream_in, 
                                              py=pyscript,
                                              outS=stream_out)

print fullPipe
# Translates to >>>
# samtools view -h test.bam | python ./bam_tags.py | samtools view -bh - > small.bam 

I then used popen from the os module instead and this worked as expected:

os.popen(fullPipe)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.