Algorithmic complexity of mongodb's array operations

Question

Mongodb supports many useful array operations such as $push and $pop but I can't seem to find any information about their algorithmic complexity nor how they are implemented to figure out their runtime complexity. Any help would be greatly appreciated.

Wouldn't push() and pop() be O(1)? Assuming that they are not growing the size of the array or anything like that, of course. Edit: nevermind, the other comment went away, it seems. — aroth
– aroth, Commented Jul 6, 2012 at 3:49

Thilo · Accepted Answer · 2012-07-06 03:52:05Z

3

I think when it comes to Mongo updates, there are only three relevant cases:

1) an in-place atomic update. For example just increment an integer. This is very fast.

2) an in-place replace. The whole document has to be rewritten, but it still fits into the current space (it shrank or there is enough padding).

3) a document migration. You have to write the document to a new location.

In addition to that there is the cost of updating affected indexes (all, if the whole thing had to be moved).

What you actually do inside of the document (push around an array, add a field) should not have any significant impact on the total cost of the operation, which seem to depend mostly linearly on the size of the document (network and disk transfer costs).

answered Jul 6, 2012 at 3:52

Thilo

264k107 gold badges527 silver badges674 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jhchen Over a year ago

It seems like 1. O(1) 2. O(n) 3. Ω(n)? Which does a $push fall under? If an array keeps growing I can see #3 needing to happen so worst case Ω(n)? But the general case of appending a single element? Does Mongo preallocate extra space for a document so it's essentially in place thus #1? Or does it always have to rewrite the array so it's #2?

Thilo Over a year ago

Adding an element will always increase the document size, so #1 cannot happen. If there is enough free space after the document (a little padding is added by default), it would be #2. Otherwise #3. I think it is not recommended to have very large arrays embedded, especially if they can grow. One should investigate the possibility to put the elements in documents of their own instead of embedding.

Sergio Tulentsev · Accepted Answer · 2012-07-06 04:50:44Z

1

Here's where they are implemented. You can figure out the complexity from there.

This is the $pop operator, for example (this seems like O(N) to me):

    case POP: {
        uassert( 10135 ,  "$pop can only be applied to an array" , in.type() == Array );
        BSONObjBuilder bb( builder.subarrayStart( shortFieldName ) );

        int n = 0;

        BSONObjIterator i( in.embeddedObject() );
        if ( elt.isNumber() && elt.number() < 0 ) {
            // pop from front
            if ( i.more() ) {
                i.next();
                n++;
            }

            while( i.more() ) {
                bb.appendAs( i.next() , bb.numStr( n - 1 ) );
                n++;
            }
        }
        else {
            // pop from back
            while( i.more() ) {
                n++;
                BSONElement arrI = i.next();
                if ( i.more() ) {
                    bb.append( arrI );
                }
            }
        }

        ms.pushStartSize = n;
        verify( ms.pushStartSize == in.embeddedObject().nFields() );
        bb.done();
        break;
    }

edited Jul 6, 2012 at 4:50

answered Jul 6, 2012 at 3:52

Sergio Tulentsev

231k43 gold badges381 silver badges373 bronze badges

4 Comments

Thilo Over a year ago

And I suggest that this part plays a very minor role in the whole operation (unless you have a terribly huge array to push into).

Sergio Tulentsev Over a year ago

Since he asked this question, I have to suspect the worst :) (+1'd your answer, btw)

Thilo Over a year ago

assuming the worst is a safe strategy ;-)

Eve Freeman Over a year ago

The file you linked is not the correct file on the master branch, anymore (as of May 11 2012). Now those cases are in update_internal.cpp. Nice catch on the O(n)--wonder if there is a way to optimize, but I'm guessing not.

Collectives™ on Stack Overflow

Algorithmic complexity of mongodb's array operations

2 Answers 2

2 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related