I'm running a large number of computations whose results I want to save to disk one item at a time, since whole data is too big to hold in memory. I tried using shelve to save it but I get the error:
HASH: Out of overflow pages. Increase page size
my code is below. What is the right way to do this in python? pickle loads objects into memory. shelve supports on disk write, but forces a dictionary structure where you're limited by number of keys. the final data I am saving is just a list and does not need to be in dictionary form. Just need to be able to read it one item at a time.
import shelve
def my_data():
# this is a generator that yields data points
for n in xrange(very_large_number):
yield data_point
def save_result():
db = shelve.open("result")
n = 0
for data in my_data():
# result is a Python object (a tuple)
result = compute(data)
# now save result to disk
db[str(n)] = result
db.close()