Python popen running out of memory with large output

Question

I am using the subprocess.popen() function to run an external tool that reads & writes a lot of data (>GB) to stdout. However, I'm finding that the kernel is killing the python process when it runs out of memory:

Out of memory: Kill process 8221 (python) score 971 or sacrifice child
Killed process 8221 (python) total-vm:8532708kB, anon-rss:3703912kB, file-rss:48kB

Since I know I'm handling a large amount of data I've set up popen to write stdout and stderr to files so I'm not using pipes. My code looks something like this:

errorFile = open(errorFilePath, "w")
outFile = open(outFilePath, "w")
#Use Popen to run the command
try:                
    procExecCommand = subprocess.Popen(commandToExecute, shell=False, stderr=errorFile, stdout=outFile)
    exitCode = procExecCommand.wait()

except Exception, e:
    #Write exception to error log       
    errorFile.write(str(e))     

errorFile.close()
outFile.close()

I've tried changing the shell parameter to True and setting the bufsize parameter = -1 also with no luck.

I've profiled the memory running this script and via bash and I see big spike in the memory usage when running via Python than compared to bash.

I'm not sure what exactly Python is doing to consume so much more memory than the just using bash unless it has something to with trying to write the output to the file? The bash script just pipes the output to a file.

I initially found that my swap space was quite low so I increased it and that helped initially, but as the volume of data grows then I start running out of memory again.

So is there anything with Python I can do to try and handle these data volumes better, or is it just a case of recommending more memory with plenty of swap space. That or jettison Python altogether.

System details:

Ubuntu 12.04
Python 2.7.3
The tool I'm running is mpileup from samtools.

You could try to run the process as Popen("myprocess -arg > output",shell=True). i.e. send the exact string you would use in bash into Popen with shell=True. — mgilson
– mgilson, Commented Jul 17, 2012 at 14:21

Community · Accepted Answer · 2017-05-23 10:09:32Z

1

The problem might be that your are using the wait() method (as in procExecCommand.wait()) which tries to run the subprocess to completion and then returns. Try the approach used in this question, which uses e.g. stdout.read() on the process handle. This way you can regularly empty the pipes, write to files, and there should be no build-up of memory.

edited May 23, 2017 at 10:09

CommunityBot

11 silver badge

answered Jul 17, 2012 at 14:44

ThomasH

23.8k13 gold badges64 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

glglgl Over a year ago

With stderr=errorFile, stdout=outFile, errorFile and outfile being regular open()ed files, there is no procExecCommand.stdout...

tuxuday · Accepted Answer · 2012-07-17 14:48:13Z

0

What kind of output your process generates, maybe the clue is in that.

Warning : The script won't terminate, you have to kill it.

This sample setup works as expected for me.

import subprocess

fobj = open("/home/tst//output","w")

subprocess.Popen("/home/tst//whileone",stdout=fobj).wait()

And whileone

#!/bin/bash

let i=1
while [ 1 ]
do
 echo "We are in iteration $i"
 let i=$i+1
 usleep 10000
done

answered Jul 17, 2012 at 14:48

tuxuday

3,03321 silver badges18 bronze badges

Collectives™ on Stack Overflow

Python popen running out of memory with large output

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related