13

I really like MongoDB, I use it at work and home, and not once yet have I hit a performance, complexity, or limitation issue with it. But I've been thinking about indexes a lot and I had a question I've not found an adequate answer to.

One of the big issues with SQL databases at scale is the relative complexity of queries. Specifically, MySQL uses b-trees for most of it's indexes, which querying takes O(log(n)), better than linear, but still means things take longer the more data you have.

A big attraction of noSQL databases is the removal/mitigation of this scaling issue, often relying instead on hash style indexes, which have O(1) lookup time, so having more data doesn't slow down your app. This is where my question comes in:

According to the offical MongoDB documentation, all indexes in Mongo use b-trees. Despite the fact that Mongo does in fact have a hashed index, as far as I can tell these are still stored in b-trees, same with the index on the _id field. I couldn't even find anything indicating anything about constant time anywhere in Mongo's documentation!

So my question is this: are, in fact, all indexes (including _id and hashed) in Mongo stored in b-trees? Does this mean querying for keys (even by _id) in fact takes O(log(n)) time?

Addendum: As a point of note, I'd be great if Mongo documentation provided some complexity formulas with examples queries. My favorite example of this is the Redis documentation.

Also: This is related. But I have the added specific questions regarding the hashed indexes and (more importantly) the _id index.

1 Answer 1

20

If you look at the code for indexing in mongodb (here), you can easily see that it's using btree for indexing. So the order of the algorithm is O(log n), but the base of this logarithm function is not 2, but 8192 instead, which is here in the code.

So for a million records we only have two levels (assuming the tree is balanced) and that is why it can find the record so fast. Overall, it's true the order is logarithmic, but since the base of the logarithm function is so large, it grows slowly.

Sign up to request clarification or add additional context in comments.

3 Comments

I had written out the big thing about thinking they were binary nodes with large bodies, under the assumption b-tree meant 'binary tree', then wikipedia informed me of the generalization. So thank you this answers my question! As an aside, it looks like the b-tree's in mongo are in fact self balancing.
O(log n) is the same regardless of base since logarithms to different bases different by a constant factor. 8192 is a bucket size (number of bytes) and doesn't have to do with the computational complexity of the Btree algorithm, just with (important) constant factors having to do with disk i/o. It's not a general fact that NoSQL databases use hashes to achieve O(1) lookups- that's more a property of an in-memory key-value store. Absence/avoidance of joins is more characteristic of NoSQL as complex joins are a bottleneck for relational db's.
Is this answer still valid for newer mongodb like v6?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.