2

With MongoDB 2.6.5

I have a collection of documents with this structure :

{
  "_id" : ObjectId("5485cd0c6b0f96004220e414"),
  "exampleList" : [{
      "Value" : "uri:obj:id:1258477.479.129403280"
    },{
      "Value" : "uri:obj:id:1258477.542.542541247"
    }, {
      "Value" : "uri:obj:id:1258477.365.455255425"
    }
    [...]
    {
      "Value" : "uri:obj:id:1258477.147.855556255"
    }]
}

I have set a multikey index on "exampleList.Value".

I want to request it with a regex of type "starts with" but it can be very slow according to the regex. Shorter is the fix part of the regex (more results), slower is the treatment.

Demo with a 100 millions documents collection "myCollection" :

Fastest execution (immediate):

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.129403280$/}})
156

Fast execution (some seconds):

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.129.*$/}})
502

Slower execution (some seconds) :

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.1.*$/}})
40947

Slow execution (~2 minutes)

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479.*$/}})
342275

Very very slooooowww execution (not terminated, several minutes)

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.47.*$/}})

I don't understand why the time of the treatment is not the same in all these queries.

2
  • 1
    MongoDB has to gradually scan over more documents as you change your regex (e.g. less documents can be excluded by the query engine based on the index and your regex). Try using explain and I bet that nscanned increases. Commented Dec 9, 2014 at 16:14
  • I have done the test using explain and you are true. I understand what you said but i have a question : Why MongoDB needs to scan documents for the count() query ? Indexes are not enough? ; I am searching an efficient way to do a "starts with" query in my multikey index. Maybe have you a solution ? Commented Dec 10, 2014 at 8:36

1 Answer 1

2

The first regex completes quickly because it uses only explicit characters.

/^uri:obj:id:1258477\.479\.129403280$/

Compare to the other regexes which use greedy wildcards '.*'.

/^uri:obj:id:1258477\.47.*$/

This contains the shortest set of definite characters at the beginning of the string, over many millions of documents there may be many that match the first part.

Try replacing the '.*' with an absolute length or range i.e. '.{0,25}'. It may be quicker yet to replace with a string.beginsWith method if available.

Sign up to request clarification or add additional context in comments.

3 Comments

No remarkable difference between '.*' and. '.{0,25}' in the treatment time.
Perhaps try reducing the length of 25 to something more suitable for your data, what is the maximum length you are expecting?
Tested with '.{0,10}' but this is not really better

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.