Relational data structure in python

Question

I'm looking for a SQL-relational-table-like data structure in python, or some hints for implementing one if none already exist. Conceptually, the data structure is a set of objects (any objects), which supports efficient lookups/filtering (possibly using SQL-like indexing).

For example, lets say my objects all have properties A, B, and C, which I need to filter by, hence I define the data should be indexed by them. The objects may contain lots of other members, which are not used for filtering. The data structure should support operations equivalent to SELECT <obj> from <DATASTRUCTURE> where A=100 (same for B and C). It should also be possible to filter by more than one field (where A=100 and B='bar').

The requirements are:

Should support a large number of items (~200K). The items must be the objects themselves, and not some flattened version of them (which rules out sqlite and likely pandas).
Insertion should be fast, should avoid reallocation of memory (which pretty much rules out pandas)
Should support simple filtering (like the example above), which must be more efficient than O(len(DATA)), i.e. avoid "full table scans".

Does such data structure exist?

Please don't suggest using sqlite. I'd need to repeatedly convert object->row and row->object, which is time consuming and cumbersome since my objects are not necessarily flat-ish.

Also, please don't suggest using pandas because repeated insertions of rows is too slow as it may requires frequent reallocation.

Alan Hoover · Accepted Answer · 2015-04-15 14:17:05Z

So long as you don't have any duplicates on (a,b,c) you could sub-class dict, enter your objects indexed by the tuple(a,b,c), and define your filter method (probably a generator) to return all entries that match your criteria.

class mydict(dict):
    def filter(self,a=None, b=None, c=None):
        for key,obj in enumerate(self):
            if (a and (key[0] == a)) or not a:
                if (b and (key[1] == b)) or not b:
                    if (c and (key[2] == c)) or not c:
                        yield obj

that is an ugly and very inefficient example, but you get the idea. I'm sure there is a better implementation method in itertools, or something.

edit:

I kept thinking about this. I toyed around with it some last night and came up with storing the objects in a list and storing dictionaries of the indexes by the desired keyfields. Retrieve objects by taking the intersection of the indexes for all specified criteria. Like this:

objs = []
aindex = {}
bindex = {}
cindex = {}

def insertobj(a,b,c,obj):
    idx = len(objs)
    objs.append(obj)
    if a in aindex:
        aindex[a].append(idx)
    else:
        aindex[a] = [idx]

    if b in bindex: 
        bindex[b].append(idx)
    else:
        bindex[b] = [idx]

    if c in cindex:
        cindex[c].append(idx)
    else :
        cindex[c] = [idx]

def filterobjs(a=None,b=None,c=None):
    if a : aset = set(aindex[a])
    if b : bset = set(bindex[b])
    if c : cset = set(cindex[c])
    result = set(range(len(objs)))
    if a and aset : result = result.intersection(aset)
    if b and bset : result = result.intersection(bset)
    if c and cset : result = result.intersection(cset)
    for idx in result:
        yield objs[idx]

class testobj(object):
    def __init__(self,a,b,c):
        self.a = a
        self.b = b
        self.c = c

    def show(self):
        print ('a=%i\tb=%i\tc=%s'%(self.a,self.b,self.c))

if __name__ == '__main__':
    for a in range(20):
        for b in range(5):
            for c in ['one','two','three','four']:
                insertobj(a,b,c,testobj(a,b,c))

    for obj in filterobjs(a=5):
        obj.show()
    print()
    for obj in filterobjs(b=3):
        obj.show()
    print()
    for obj in filterobjs(a=8,c='one'):
        obj.show()

it should be reasonably quick, although the objects are in a list, they are accessed directly by index. The "searching" is done on a hashed dict.

@Martijn Pieters gave an excellent example for the matching/ selecting/ filtering aspect of this question using fnmatch on the question here link .

Collectives™ on Stack Overflow

Relational data structure in python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related