1

I am using the subprocess.popen() function to run an external tool that reads & writes a lot of data (>GB) to stdout. However, I'm finding that the kernel is killing the python process when it runs out of memory:

Out of memory: Kill process 8221 (python) score 971 or sacrifice child
Killed process 8221 (python) total-vm:8532708kB, anon-rss:3703912kB, file-rss:48kB

Since I know I'm handling a large amount of data I've set up popen to write stdout and stderr to files so I'm not using pipes. My code looks something like this:

errorFile = open(errorFilePath, "w")
outFile = open(outFilePath, "w")
#Use Popen to run the command
try:                
    procExecCommand = subprocess.Popen(commandToExecute, shell=False, stderr=errorFile, stdout=outFile)
    exitCode = procExecCommand.wait()

except Exception, e:
    #Write exception to error log       
    errorFile.write(str(e))     

errorFile.close()
outFile.close()        

I've tried changing the shell parameter to True and setting the bufsize parameter = -1 also with no luck.

I've profiled the memory running this script and via bash and I see big spike in the memory usage when running via Python than compared to bash.

I'm not sure what exactly Python is doing to consume so much more memory than the just using bash unless it has something to with trying to write the output to the file? The bash script just pipes the output to a file.

I initially found that my swap space was quite low so I increased it and that helped initially, but as the volume of data grows then I start running out of memory again.

So is there anything with Python I can do to try and handle these data volumes better, or is it just a case of recommending more memory with plenty of swap space. That or jettison Python altogether.

System details:

  • Ubuntu 12.04
  • Python 2.7.3
  • The tool I'm running is mpileup from samtools.
1
  • You could try to run the process as Popen("myprocess -arg > output",shell=True). i.e. send the exact string you would use in bash into Popen with shell=True. Commented Jul 17, 2012 at 14:21

2 Answers 2

1

The problem might be that your are using the wait() method (as in procExecCommand.wait()) which tries to run the subprocess to completion and then returns. Try the approach used in this question, which uses e.g. stdout.read() on the process handle. This way you can regularly empty the pipes, write to files, and there should be no build-up of memory.

Sign up to request clarification or add additional context in comments.

1 Comment

With stderr=errorFile, stdout=outFile, errorFile and outfile being regular open()ed files, there is no procExecCommand.stdout...
0

What kind of output your process generates, maybe the clue is in that.

Warning : The script won't terminate, you have to kill it.

This sample setup works as expected for me.

import subprocess

fobj = open("/home/tst//output","w")

subprocess.Popen("/home/tst//whileone",stdout=fobj).wait()

And whileone

#!/bin/bash

let i=1
while [ 1 ]
do
 echo "We are in iteration $i"
 let i=$i+1
 usleep 10000
done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.