7

I have a python program where I am using a priority queue to keep track of objects to be processed. At the moment the queue is implemented using a SortedList, which is working very well.

However, I need to extend this code, so that the list is kept sorted on multiple keys. Somewhat like an SQL database with indexes on multiple columns, so that I can efficiently access, add and delete objects on all keys. My workload is add/delete heavy. To give an idea of what I want to do, here is some pseudo code:

ml = MultiSortedList()
ml.append((1, "z", 1.5), a)
ml.append((2, "a", 40.0), b)
ml.append((3, "f", 0.5), c)

print(ml.sorted(0))
   [((1, "z", 1.5), a),
   ((2, "a", 40.0), b),
   ((3, "f", 0.5), c),]

print(ml.sorted(2))
   [((3, "f", 0.5), c),
   ((1, "z", 1.5), a),
   ((2, "a", 40.0), b)]

print(ml.sorted(2).pop(1)
   (1, "z", 1.5), a)

print(ml.sorted(0))
   [((2, "a", 40.0), b),
   ((3, "f", 0.5), c)]

I can't quite figure out how to do this efficiently. Of course I could sort the list again for each access to a different column, but that is simply too expensive. Also, the O(n) delete operations on python lists become painful, as the list may contain several thousands of objects.

Is there an existing data structure (preferably already implemented in python) which solves this problem? If not, can you help me with an outline for how to implement this efficiently?

4
  • How many columns/ different fields are there ? Is it a fixed number ? Commented Mar 18, 2015 at 9:55
  • Also why not let sql do this for you ? Commented Mar 18, 2015 at 9:58
  • if you're looking for SQL-style functionality within python, have you looked at the pandas package - pandas.pydata.org ? Commented Mar 18, 2015 at 11:03
  • @sasha: About 4-5 columns. I can live with rewriting the application if this changes. Problem with doing it in SQL is simply overhead. This is a datastructure that is going to be accessed a lot, so I think implementing it as adding/dropping rows in a database (and serializing objects) will have too high overhead. Commented Mar 18, 2015 at 11:54

2 Answers 2

6

Instead of using a sorted list as the implementation of your priority queue, you should be using a heap. A binary heap, for example, has log(n) insertion and deletion. A Fibonacci heap will have O(1) insertion.

You have a implementation inside the standard library: the heapq module.

For your use case, you'll need to keep multiple heaps: one for each sorting order. For clarity of implementation, you may want to hold your original data on a dictionary (possibly using random or incrementing keys) and only keep its keys on the heaps.

Using this technique, you can easily get O(1) insertion and O(log(n)) deletion. You don't really specify what kind of accesses you'll be having, but if you need random access, you can use binary search on the appropriate heap to get O(log(n))

Sign up to request clarification or add additional context in comments.

1 Comment

I ended up using a solution similar to this with a wrapper which inserts and deletes objects from all the heaps as needed. Thanks for the help!
1

One approach is to maintain n list internally, one for each sorting order, and to add/remove item for each of them. This way, you "only" multiply the manipulation time of your data structure by a constant value (i.e. if using 3 keys instead of one, then got 3 log(n) instead of log(n) to delete/insert an element).

The concept I imagine behind such implementation is the java's Comparator one.

Create for each key a comparator method to use to sort them, and then use it at insert and deleting time.

It would work as follow :

class SortedList(list):
    def __init__(self, comparator = None):
        list.__init__(self)

        self.comparator = comparator

    def add(self, element):
        """ Adds the element into the sorted list.
        If no comparator where provided at initialisation, then try to compare
        the element to item already stored in the list using standard __gt__ 
        and __eq__.

        Argument :
            element : the element to insert in the list

        /!\ NB : A more inteligent implementation should be used, such as binary search... /!\
        """
        index = 0
        while (index < len(self)):
            if self.isGreatterThan(element, self[index]):
                index += 1
            else:
                break

        self.insert(index, element)

    def remove(self, element):
        """ Same as previous, a better approach is possible that use binary search """
        list.remove(self, element)


    def isGreatterThan(self, element, otherElement):
        """ Compare if element is greater than other element. """
        if self.comparator != None:
            return self.comparator(element, otherElement)
        else:
            return element.__gt__(otherElement)


class MultipleKeysObjectContainer():
    def __init__(self, comparators):
        #register the comparators
        self.comparators = comparators
        #create a List for each comparator
        self.data = {}
        for key in self.comparators.keys():
            self.data[key] = SortedList(comparator = self.comparators[key])

    def add(self, element):
        for key in self.data.keys():
            self.data[key].add(element)

    def __repr__(self):
        return "<MultipleKeysObjectContainer :"+self.data.__repr__()+">"

    def __str__(self):
        result = "MultipleKeysObjectContainer{\n"
        for key in self.data.keys():
            result += "\tOrder by : "+key+"{\n"
            for element in self.data[key]:
                result += "\t\t" + str(element) + "\n"
            result += "\t}\n"
        result += "}"
        return result

    def popOrderedBy(self, key, position):
        """ pop the item a the position in the list of item ordered by the key.
        Remove also from other data containers. """
        item = self.data[key].pop(position)
        for dataKey in self.data.keys():
            if dataKey != key:
                self.data[dataKey].remove(item)

        return item



if __name__ == "__main__":
    a = SortedList(lambda x,y : x[0][0] > y[0][0])

    item1 = ((1, "z", 1.5),"foo")
    item2 = ((2, "a", 40.0), "bar")
    item3 = ((3, "f", 0.5), "barfoo")

    a.add(item1)
    a.add(item3)
    a.add(item2)
    print("Example of sorted list")
    print(a)
    a.remove(item3)
    print("The same list without the barfoo")
    print(a)


    b = MultipleKeysObjectContainer({"id": (lambda x,y : x[0][0] > y[0][0]), "letter": (lambda x,y : x[0][1] > y[0][1] ), "value":(lambda x,y : x[0][2] > y[0][2])})

    b.add(item1)
    b.add(item3)
    b.add(item2)
    print("A multiple key container, object are ordered according three criterion.")
    print(b)
    print("Remove the first item if items are ordered by letter", b.popOrderedBy("letter", 0))
    print("After this removing the container contains :")
    print(b)

This results in :

Example of sorted list
[((1, 'z', 1.5), 'foo'), ((2, 'a', 40.0), 'bar'), ((3, 'f', 0.5), 'barfoo')]
The same list without the barfoo
[((1, 'z', 1.5), 'foo'), ((2, 'a', 40.0), 'bar')]
A multiple key container, object are ordered according three criterion.
MultipleKeysObjectContainer{
    Order by : id{
        ((1, 'z', 1.5), 'foo')
        ((2, 'a', 40.0), 'bar')
        ((3, 'f', 0.5), 'barfoo')
    }
    Order by : value{
        ((3, 'f', 0.5), 'barfoo')
        ((1, 'z', 1.5), 'foo')
        ((2, 'a', 40.0), 'bar')
    }
    Order by : letter{
        ((2, 'a', 40.0), 'bar')
        ((3, 'f', 0.5), 'barfoo')
        ((1, 'z', 1.5), 'foo')
    }
}
Remove the first item if items are ordered by letter ((2, 'a', 40.0), 'bar')
After this removing the container contains :
MultipleKeysObjectContainer{
    Order by : id{
        ((1, 'z', 1.5), 'foo')
        ((3, 'f', 0.5), 'barfoo')
    }
    Order by : value{
        ((3, 'f', 0.5), 'barfoo')
        ((1, 'z', 1.5), 'foo')
    }
    Order by : letter{
        ((3, 'f', 0.5), 'barfoo')
        ((1, 'z', 1.5), 'foo')
    }
}

That look like you are looking for ( almost, just have to add binary search :p ) Good luck!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.