11

I'm seeking advice about methods of implementing object persistence in Python. To be more precise, I wish to be able to link a Python object to a file in such a way that any Python process that opens a representation of that file shares the same information, any process can change its object and the changes will propagate to the other processes, and even if all processes "storing" the object are closed, the file will remain and can be re-opened by another process.

I found three main candidates for this in my distribution of Python - anydbm, pickle, and shelve (dbm appeared to be perfect, but it is Unix-only, and I am on Windows). However, they all have flaws:

  • anydbm can only handle a dictionary of string values (I'm seeking to store a list of dictionaries, all of which have string keys and string values, though ideally I would seek a module with no type restrictions)
  • shelve requires that a file be re-opened before changes propagate - for instance, if two processes A and B load the same file (containing a shelved empty list), and A adds an item to the list and calls sync(), B will still see the list as being empty until it reloads the file.
  • pickle (the module I am currently using for my test implementation) has the same "reload requirement" as shelve, and also does not overwrite previous data - if process A dumps fifteen empty strings onto a file, and then the string 'hello', process B will have to load the file sixteen times in order to get the 'hello' string. I am currently dealing with this problem by preceding any write operation with repeated reads until end of file ("wiping the slate clean before writing on it"), and by making every read operation repeated until end of file, but I feel there must be a better way.

My ideal module would behave as follows (with "A>>>" representing code executed by process A, and "B>>>" code executed by process B):

A>>> import imaginary_perfect_module as mod
B>>> import imaginary_perfect_module as mod
A>>> d = mod.load('a_file') 
B>>> d = mod.load('a_file')
A>>> d
{}
B>>> d
{}
A>>> d[1] = 'this string is one'
A>>> d['ones'] = 1   #anydbm would sulk here
A>>> d['ones'] = 11 
A>>> d['a dict'] = {'this dictionary' : 'is arbitrary', 42 : 'the answer'}
B>>> d['ones']   #shelve would raise a KeyError here, unless A had called d.sync() and B had reloaded d
11    #pickle (with different syntax) would have returned 1 here, and then 11 on next call
(etc. for B)

I could achieve this behaviour by creating my own module that uses pickle, and editing the dump and load behaviour so that they use the repeated reads I mentioned above - but I find it hard to believe that this problem has never occurred to, and been fixed by, more talented programmers before. Moreover, these repeated reads seem inefficient to me (though I must admit that my knowledge of operation complexity is limited, and it's possible that these repeated reads are going on "behind the scenes" in otherwise apparently smoother modules like shelve). Therefore, I conclude that I must be missing some code module that would solve the problem for me. I'd be grateful if anyone could point me in the right direction, or give advice about implementation.

1
  • 4
    Give a look to mongo-db. It's not as completely integrated into the language as your example above, but it will give you a much more robust and tolerant database than pickling to the filesystem and being smart about locks. Commented May 31, 2012 at 9:34

3 Answers 3

10

Use the ZODB (the Zope Object Database) instead. Backed with ZEO it fulfills your requirements:

  • Transparent persistence for Python objects

    ZODB uses pickles underneath so anything that is pickle-able can be stored in a ZODB object store.

  • Full ACID-compatible transaction support (including savepoints)

    This means changes from one process propagate to all the other processes when they are good and ready, and each process has a consistent view on the data throughout a transaction.

ZODB has been around for over a decade now, so you are right in surmising this problem has already been solved before. :-)

The ZODB let's you plug in storages; the most common format is the FileStorage, which stores everything in one Data.fs with an optional blob storage for large objects.

Some ZODB storages are wrappers around others to add functionality; DemoStorage for example keeps changes in memory to facilitate unit testing and demonstration setups (restart and you have clean slate again). BeforeStorage gives you a window in time, only returning data from transactions before a given point in time. The latter has been instrumental in recovering lost data for me.

ZEO is such a plugin that introduces a client-server architecture. Using ZEO lets you access a given storage from multiple processes at a time; you won't need this layer if all you need is multi-threaded access from one process only.

The same could be achieved with RelStorage, which stores ZODB data in a relational database such as PostgreSQL, MySQL or Oracle.

Sign up to request clarification or add additional context in comments.

2 Comments

ZODB sounds like precisely what I want (and RelStorage sounds like something interesting to check out for the future) - thank you! I'll test it out and return to mark this as answered if it works for me.
This sounds like what I want too; over what shelve provides.
2

For beginners, You can port your shelve databases to ZODB databases like this:

#!/usr/bin/env python
import shelve
import ZODB, ZODB.FileStorage
import transaction
from optparse import OptionParser
import os
import sys
import re

reload(sys)
sys.setdefaultencoding("utf-8")

parser = OptionParser()

parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename")
parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename")

parser.set_defaults()
options, args = parser.parse_args()

if options.in_file == False or options.out_file == False :
    print "Need input and output database filenames"
    exit(1)

db = shelve.open(options.in_file, writeback=True)
zstorage = ZODB.FileStorage.FileStorage(options.out_file)
zdb = ZODB.DB(zstorage)
zconnection = zdb.open()
newdb = zconnection.root()

for key, value in db.iteritems() :
    print "Copying key: " + str(key)
    newdb[key] = value

transaction.commit()

Comments

0

I suggest using TinyDB, it's much much better and simple to use.

https://tinydb.readthedocs.io/en/stable/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.