2

My program runs a simulation which requires huge objects to store the data in. The size of the blob is larger than 2-3GB. Even though I should have anough memory in my MBP, python (Python 2.7.3 on Mac OS X, from ports) cannot seem to use it all, and the system gets totally frozen.

To save the status of the simulation, I use pickle, but it also doesn't work for too large objects, it seems as if pickle would duplicate the objects in the memory before dumping them...

QUESTION: is there a standard library which can handle huge python data structures (dict, set, list) without keeping them in the memory all the time? Alternatively is there a way to force python to run in virtual memory? (I'm not very familiar with numpy, would it help me in this situation?)

Thanks in advance!

8
  • 1
    You are using 64 bit Python right? Commented Dec 27, 2012 at 13:20
  • Have you tried not storing all the data in an object, and instead keeping it as a file on the disk and reading it piece by piece and doing your processing in steps? Commented Dec 27, 2012 at 13:23
  • I know that it doesn't help you, but I just had an example of mine yesterday night with a list of tuples 6G large, python was able to handle no problem, MacOS 10.8 here. So it's not rally python's issue I guess. My machine has 20G total RAM. Commented Dec 27, 2012 at 13:27
  • @OdayMansour: yes, that is the alternative solution that I rewrite the code. But I want to avoid it if there is an already existing solution for the problem. Commented Dec 27, 2012 at 13:39
  • @DavidHeffernan: yes 64bit Commented Dec 27, 2012 at 13:41

1 Answer 1

2

If you are using the 64bit version of Python and still run into problems with pickle or other built-in modules, you can store the Python objects in an object-orientated database instead. We're working with large objects (~10GB) here everyday and use ZODB for that. It's not the fastest but gets the job done.

I also hear that dobbin might be a good alternative.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I will try this ZODB. My last hope before posting this question was cPickle, but it seems that it has the same problem like the standard python implementation.
ZODB seems to have solved my issue, although I will be truely happy only tomorrow when the calculation ends successfully. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.