1

I'm looking for a SQL-relational-table-like data structure in python, or some hints for implementing one if none already exist. Conceptually, the data structure is a set of objects (any objects), which supports efficient lookups/filtering (possibly using SQL-like indexing).

For example, lets say my objects all have properties A, B, and C, which I need to filter by, hence I define the data should be indexed by them. The objects may contain lots of other members, which are not used for filtering. The data structure should support operations equivalent to SELECT <obj> from <DATASTRUCTURE> where A=100 (same for B and C). It should also be possible to filter by more than one field (where A=100 and B='bar').

The requirements are:

  1. Should support a large number of items (~200K). The items must be the objects themselves, and not some flattened version of them (which rules out sqlite and likely pandas).
  2. Insertion should be fast, should avoid reallocation of memory (which pretty much rules out pandas)
  3. Should support simple filtering (like the example above), which must be more efficient than O(len(DATA)), i.e. avoid "full table scans".

Does such data structure exist?


Please don't suggest using sqlite. I'd need to repeatedly convert object->row and row->object, which is time consuming and cumbersome since my objects are not necessarily flat-ish.

Also, please don't suggest using pandas because repeated insertions of rows is too slow as it may requires frequent reallocation.

1 Answer 1

1

So long as you don't have any duplicates on (a,b,c) you could sub-class dict, enter your objects indexed by the tuple(a,b,c), and define your filter method (probably a generator) to return all entries that match your criteria.

class mydict(dict):
    def filter(self,a=None, b=None, c=None):
        for key,obj in enumerate(self):
            if (a and (key[0] == a)) or not a:
                if (b and (key[1] == b)) or not b:
                    if (c and (key[2] == c)) or not c:
                        yield obj

that is an ugly and very inefficient example, but you get the idea. I'm sure there is a better implementation method in itertools, or something.

edit:

I kept thinking about this. I toyed around with it some last night and came up with storing the objects in a list and storing dictionaries of the indexes by the desired keyfields. Retrieve objects by taking the intersection of the indexes for all specified criteria. Like this:

objs = []
aindex = {}
bindex = {}
cindex = {}

def insertobj(a,b,c,obj):
    idx = len(objs)
    objs.append(obj)
    if a in aindex:
        aindex[a].append(idx)
    else:
        aindex[a] = [idx]

    if b in bindex: 
        bindex[b].append(idx)
    else:
        bindex[b] = [idx]

    if c in cindex:
        cindex[c].append(idx)
    else :
        cindex[c] = [idx]

def filterobjs(a=None,b=None,c=None):
    if a : aset = set(aindex[a])
    if b : bset = set(bindex[b])
    if c : cset = set(cindex[c])
    result = set(range(len(objs)))
    if a and aset : result = result.intersection(aset)
    if b and bset : result = result.intersection(bset)
    if c and cset : result = result.intersection(cset)
    for idx in result:
        yield objs[idx]

class testobj(object):
    def __init__(self,a,b,c):
        self.a = a
        self.b = b
        self.c = c

    def show(self):
        print ('a=%i\tb=%i\tc=%s'%(self.a,self.b,self.c))

if __name__ == '__main__':
    for a in range(20):
        for b in range(5):
            for c in ['one','two','three','four']:
                insertobj(a,b,c,testobj(a,b,c))

    for obj in filterobjs(a=5):
        obj.show()
    print()
    for obj in filterobjs(b=3):
        obj.show()
    print()
    for obj in filterobjs(a=8,c='one'):
        obj.show()

it should be reasonably quick, although the objects are in a list, they are accessed directly by index. The "searching" is done on a hashed dict.

Sign up to request clarification or add additional context in comments.

1 Comment

@Martijn Pieters gave an excellent example for the matching/ selecting/ filtering aspect of this question using fnmatch on the question here link .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.