3

Based on mongodb documentation

The ensureIndex() function only creates the index if it does not exist.

Once a collection is indexed on a key, random access on query expressions which match the specified key are fast. Without the index, MongoDB has to go through each document checking the value of specified key in the query:

db.things.find({j:2});  // fast - uses index
db.things.find({x:3});  // slow - has to check all because 'x' isn't 

Does that mean the 1st line of code runtime is big_theta = 1, and 2nd line of code is big_theta = n ?

2
  • @SergioTulentsev you should post as answer, and get points! :) Commented Nov 5, 2012 at 20:33
  • @LaneLane: too late, there is already an answer :) Commented Nov 5, 2012 at 20:35

3 Answers 3

10

MongoDB uses B-tree for indexing, as can be seen in the source code for index.cpp. This means that lookups can be expressed as O(log N) where N is the number of documents, but it is also O(D) if D is the depth of the tree (assuming the tree is somewhat balanced). D is usually very small because each node will have many children.

The number of children in a node in MongoDB is about 8192 (btree.h) so an index with a few billion documents may fit in a tree with only 3 levels! You easily realize that the logarithm is not log_2 (as in binary trees) but instead log_8192, which grows extremely slowly.

Because of this, b-trees are usually regarded as constant-time lookup, O(1), in practice.

Another good reason for keeping many children in each node is that the index is stored on disk. You want to try to utilize all the space in a disk block for one node to improve cache performance and reduce disk seeks. B-trees have very good disk performance because you only need to visit very few nodes to find what you are looking for.

Sign up to request clarification or add additional context in comments.

4 Comments

is there any way to ensure that there are many children for each node as a developer, or is that done by mongodb?
That is done by the btree implementation. I have not studied MongoDB specifically (other than the source skimming for this post) but they are clever guys so I guess they have done something in terms of balancing and distribution.
Edited ambiguous statement in answer. Its not an O(1) lookup. See: en.wikipedia.org/wiki/B-tree
Manav, If you read my answer carefully you see that I motivate my statements. I also already do say that it is logarithmic but with an extremely slow-growing logarithm. So editing it would change the intent of the answer completely. Just upvote what you consider to be the correct answer instead. (But keep in mind that most people will never ever reach 4 levels in a balanced b-tree with 8k children in each node, much less five or six. So does the logarithm actually matter here?). I could also just appeal to authority: my motivation is copied from my DB professors at Uppsala University :-)
3

Mongo indexes are B-trees, so an indexed lookup is O(log n). Unindexed lookups are O(n).

2 Comments

is there any way to reduce the indexed lookup runtime further?
Reduce n. Other than that, no.
0

A B-tree is O(log N), Emil Vikström's answer of O(1) is simply incorrect. Even under his "motivation" (or assumption), it is wrong: he forgot the time to search the 8192 children for each node. In other words, if K is the size of the node, D is depth of the tree, the time complex can be re-expressed as O(D) + O(log K) (which is equivalent to O(log N)), if the children is organized as BST or in similar logarithmic structure.

1 Comment

You are mathematically correct, which I did open my answer with. My motivation is that the exact asymptotic runtime is irrelevant in practice. You can index all atoms in the known universe with 21 levels. I did not ignore the time to search the children, that number is bounded in and thus O(1) in each node.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.