3

I would like to know what kind of internal indexing algorithm MongoDB is using. Because I have some data want to store, and each document (row) has a id, which is probably a unique hash value. (e.g. generated by md5() or other hash algorithm). so, I would to understand which hash method I should use to create the id, so that it is fast for the MongoDB to index it. :)

1
  • 2
    Did you check the MongoDB source code repository yet? If not, why not? If so, what specific code module were you reading? Commented Apr 2, 2011 at 0:47

2 Answers 2

8

Yes, mongoDB use b-tree, documentation:

An index is a data structure that collects information about the values of the specified fields in the documents of a collection. This data structure is used by Mongo's query optimizer to quickly sort through and order the documents in a collection. Formally speaking, these indexes are implemented as "B-Tree" indexes.

I suggest to use mongodb ObjectId for collection _id, and don't care about: "How to create _id?" at all. Because it probably task for mongodb, but not for developer. I suppose that better to care about schema, indexes, etc..

Sign up to request clarification or add additional context in comments.

3 Comments

B-tree and binary tree are not the same.
because I need to have an index which cannot use the Mongo internal index, so, I am wondering, if mongo perform better with index that: 1) String only, 2) numbers only, 3) Fix # HexNumeric characters. (mix of char & number), 4) or doesn't matter.
@murvinlai: Probably better to ask mongodb developers at google groups (because i don't know exactly how to mongodb works internally) @pingw33n: Ohh, okay, i really don't know directly. You can correct my answer or answer yourself.
5

For Mongo 3.2+, the default storage engine is WiredTiger, and B+ tree is used to store data.

WiredTiger maintains a table's data in memory using a data structure called a B-Tree ( B+ Tree to be specific), referring to the nodes of a B-Tree as pages. Internal pages carry only keys. The leaf pages store both keys and values.

And also LSM Tree is used to store data

WiredTiger supports Log-Structured Merge Trees, where updates are buffered in small files that fit in cache for fast random updates, then automatically merged into larger files in the background so that read latency approaches that of traditional Btree files. LSM trees automatically create Bloom filters to avoid unnecessary reads from files that cannot containing matching keys.

LSM B+ Tree
Pros * More space efficient by using append-only writes and periodic compaction
* Better performance with fast-growing data and sequential writes
* No fragmentation penalty because of how SSTable files are written and updated
* Excellent performance with read-intensive workloads
Cons * CPU overhead for compaction can meaningfully affect performance and efficiency if not tuned appropriately
* More tuning options increase flexibility but can seem complex to developers and operators
* Read/search performance can be optimized with the use of bloom filters
* Increased space overhead to deal with fragmentation
* Uses random writes which causes slower create/insert behavior
* Concurrent writes may require locks which slows write performance
* Scaling challenges, especially with >50% write transactions

The choice advice

  • If you don't require extreme write throughput btree is likely to be a better choice. Read throughput is better and high volumes of writes can be maintained.
  • If you have a workload that requires a high write throughput LSM is the best choice.

Source:

4 Comments

what do you mean by And also LSM Tree is used to store data? only 1 of them would be getting used right?
@bestwishes Correct. As my understanding, from WiredTiger documentation ( source.wiredtiger.com/3.1.0/lsm.html ) it can be configured at table level. You can use different type of tree types per table. However, for mongodb , this configuration seems to be global ( github.com/mongodb/mongo/blob/master/src/mongo/db/storage/… )
For those who don't know, the difference parts is more related to "SST vs B+" and it's said that SST aims for high write performance(append only is fast) and B+ tree is for high read performance (very few disk seeks).
@suitianshi, thank you very much for your update, the difference between LSM and B+ tree is added into the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.