What Mongo Index algorithm is using? Binary Tree?

Question

I would like to know what kind of internal indexing algorithm MongoDB is using. Because I have some data want to store, and each document (row) has a id, which is probably a unique hash value. (e.g. generated by md5() or other hash algorithm). so, I would to understand which hash method I should use to create the id, so that it is fast for the MongoDB to index it. :)

Did you check the MongoDB source code repository yet? If not, why not? If so, what specific code module were you reading? — S.Lott
– S.Lott, Commented Apr 2, 2011 at 0:47

Andrew Orsich · Accepted Answer · 2011-04-02 10:25:56Z

8

Yes, mongoDB use b-tree, documentation:

An index is a data structure that collects information about the values of the specified fields in the documents of a collection. This data structure is used by Mongo's query optimizer to quickly sort through and order the documents in a collection. Formally speaking, these indexes are implemented as "B-Tree" indexes.

I suggest to use mongodb ObjectId for collection _id, and don't care about: "How to create _id?" at all. Because it probably task for mongodb, but not for developer. I suppose that better to care about schema, indexes, etc..

answered Apr 2, 2011 at 10:25

Andrew Orsich

53.8k17 gold badges142 silver badges134 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

pingw33n Over a year ago

B-tree and binary tree are not the same.

murvinlai Over a year ago

because I need to have an index which cannot use the Mongo internal index, so, I am wondering, if mongo perform better with index that: 1) String only, 2) numbers only, 3) Fix # HexNumeric characters. (mix of char & number), 4) or doesn't matter.

Andrew Orsich Over a year ago

@murvinlai: Probably better to ask mongodb developers at google groups (because i don't know exactly how to mongodb works internally) @pingw33n: Ohh, okay, i really don't know directly. You can correct my answer or answer yourself.

zangw · Accepted Answer · 2023-02-22 06:56:50Z

5

For Mongo 3.2+, the default storage engine is WiredTiger, and B+ tree is used to store data.

WiredTiger maintains a table's data in memory using a data structure called a B-Tree ( B+ Tree to be specific), referring to the nodes of a B-Tree as pages. Internal pages carry only keys. The leaf pages store both keys and values.

And also LSM Tree is used to store data

WiredTiger supports Log-Structured Merge Trees, where updates are buffered in small files that fit in cache for fast random updates, then automatically merged into larger files in the background so that read latency approaches that of traditional Btree files. LSM trees automatically create Bloom filters to avoid unnecessary reads from files that cannot containing matching keys.

	LSM	B+ Tree
Pros	* More space efficient by using append-only writes and periodic compaction * Better performance with fast-growing data and sequential writes * No fragmentation penalty because of how SSTable files are written and updated	* Excellent performance with read-intensive workloads
Cons	* CPU overhead for compaction can meaningfully affect performance and efficiency if not tuned appropriately * More tuning options increase flexibility but can seem complex to developers and operators * Read/search performance can be optimized with the use of bloom filters	* Increased space overhead to deal with fragmentation * Uses random writes which causes slower create/insert behavior * Concurrent writes may require locks which slows write performance * Scaling challenges, especially with >50% write transactions

The choice advice

If you don't require extreme write throughput btree is likely to be a better choice. Read throughput is better and high volumes of writes can be maintained.
If you have a workload that requires a high write throughput LSM is the best choice.

Source:

edited Feb 22, 2023 at 6:56

answered May 7, 2020 at 4:47

zangw

49.1k24 gold badges214 silver badges249 bronze badges

4 Comments

best wishes Over a year ago

what do you mean by And also LSM Tree is used to store data? only 1 of them would be getting used right?

naltun Over a year ago

@bestwishes Correct. As my understanding, from WiredTiger documentation ( source.wiredtiger.com/3.1.0/lsm.html ) it can be configured at table level. You can use different type of tree types per table. However, for mongodb , this configuration seems to be global ( github.com/mongodb/mongo/blob/master/src/mongo/db/storage/… )

suitianshi Over a year ago

For those who don't know, the difference parts is more related to "SST vs B+" and it's said that SST aims for high write performance(append only is fast) and B+ tree is for high read performance (very few disk seeks).

zangw Over a year ago

@suitianshi, thank you very much for your update, the difference between LSM and B+ tree is added into the answer.

Collectives™ on Stack Overflow

What Mongo Index algorithm is using? Binary Tree?

2 Answers 2

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related