5

Mongodb supports many useful array operations such as $push and $pop but I can't seem to find any information about their algorithmic complexity nor how they are implemented to figure out their runtime complexity. Any help would be greatly appreciated.

2
  • Wouldn't push() and pop() be O(1)? Assuming that they are not growing the size of the array or anything like that, of course. Edit: nevermind, the other comment went away, it seems. Commented Jul 6, 2012 at 3:49
  • 1
    That's a dangerous assumption :) Commented Jul 6, 2012 at 3:51

2 Answers 2

3

I think when it comes to Mongo updates, there are only three relevant cases:

1) an in-place atomic update. For example just increment an integer. This is very fast.

2) an in-place replace. The whole document has to be rewritten, but it still fits into the current space (it shrank or there is enough padding).

3) a document migration. You have to write the document to a new location.

In addition to that there is the cost of updating affected indexes (all, if the whole thing had to be moved).

What you actually do inside of the document (push around an array, add a field) should not have any significant impact on the total cost of the operation, which seem to depend mostly linearly on the size of the document (network and disk transfer costs).

Sign up to request clarification or add additional context in comments.

2 Comments

It seems like 1. O(1) 2. O(n) 3. Ω(n)? Which does a $push fall under? If an array keeps growing I can see #3 needing to happen so worst case Ω(n)? But the general case of appending a single element? Does Mongo preallocate extra space for a document so it's essentially in place thus #1? Or does it always have to rewrite the array so it's #2?
Adding an element will always increase the document size, so #1 cannot happen. If there is enough free space after the document (a little padding is added by default), it would be #2. Otherwise #3. I think it is not recommended to have very large arrays embedded, especially if they can grow. One should investigate the possibility to put the elements in documents of their own instead of embedding.
1

Here's where they are implemented. You can figure out the complexity from there.

This is the $pop operator, for example (this seems like O(N) to me):

    case POP: {
        uassert( 10135 ,  "$pop can only be applied to an array" , in.type() == Array );
        BSONObjBuilder bb( builder.subarrayStart( shortFieldName ) );

        int n = 0;

        BSONObjIterator i( in.embeddedObject() );
        if ( elt.isNumber() && elt.number() < 0 ) {
            // pop from front
            if ( i.more() ) {
                i.next();
                n++;
            }

            while( i.more() ) {
                bb.appendAs( i.next() , bb.numStr( n - 1 ) );
                n++;
            }
        }
        else {
            // pop from back
            while( i.more() ) {
                n++;
                BSONElement arrI = i.next();
                if ( i.more() ) {
                    bb.append( arrI );
                }
            }
        }

        ms.pushStartSize = n;
        verify( ms.pushStartSize == in.embeddedObject().nFields() );
        bb.done();
        break;
    }

4 Comments

And I suggest that this part plays a very minor role in the whole operation (unless you have a terribly huge array to push into).
Since he asked this question, I have to suspect the worst :) (+1'd your answer, btw)
assuming the worst is a safe strategy ;-)
The file you linked is not the correct file on the master branch, anymore (as of May 11 2012). Now those cases are in update_internal.cpp. Nice catch on the O(n)--wonder if there is a way to optimize, but I'm guessing not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.