2

Here is our document schema

{
  name: String
}

Here is our query

{
  name: {$in: ["Jack", "Tom"]}
}

I believe even if there isn't an index on name, the query engine will turn the array in the $in into a hashset and then check for presence as it scans through each record with a COLSCAN which is O(n). It will never do a naive O(m*n) search, right?

I'm trying to find supporting documentation online but I've come up short. I've tried searching in the source code but I can't seem to find the exact section responsible for this either.

If the index exists I believe that it will use it directly instead and be faster. If I'm not wrong I think it will be O(m*log(n)) as it gets the result set in log(n) time from the b-tree for every element in the $in array and returns the union of them all. Though big Oh wise for large m it seems slower than the O(n) hashset approach, its faster in practice as the disk reads are much more expensive.

Is this line of thinking correct when there is an index?

And if there isn't an index does it do the COLSCAN with a naive search or will it use a hashset to fasten the process?

1 Answer 1

1

When setting up the query, the $in expression sorts the non-regex elements in the setEqualities function:

    if (!std::is_sorted(_originalEqualityVector.begin(),
                        _originalEqualityVector.end(),
                        _eltCmp.makeLessThan())) {
        std::sort(
            _originalEqualityVector.begin(), _originalEqualityVector.end(), _eltCmp.makeLessThan());
    }

It then tests the element from each document using the contains function, which uses a binary search:

bool InMatchExpression::contains(const BSONElement& e) const {
    return std::binary_search(_equalitySet.begin(), _equalitySet.end(), e, _eltCmp.makeLessThan());
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.