Count() query in MongoDB with regex filter : slow performance

Question

With MongoDB 2.6.5

I have a collection of documents with this structure :

{
  "_id" : ObjectId("5485cd0c6b0f96004220e414"),
  "exampleList" : [{
      "Value" : "uri:obj:id:1258477.479.129403280"
    },{
      "Value" : "uri:obj:id:1258477.542.542541247"
    }, {
      "Value" : "uri:obj:id:1258477.365.455255425"
    }
    [...]
    {
      "Value" : "uri:obj:id:1258477.147.855556255"
    }]
}

I have set a multikey index on "exampleList.Value".

I want to request it with a regex of type "starts with" but it can be very slow according to the regex. Shorter is the fix part of the regex (more results), slower is the treatment.

Demo with a 100 millions documents collection "myCollection" :

Fastest execution (immediate):

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.129403280$/}})
156

Fast execution (some seconds):

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.129.*$/}})
502

Slower execution (some seconds) :

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.1.*$/}})
40947

Slow execution (~2 minutes)

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479.*$/}})
342275

Very very slooooowww execution (not terminated, several minutes)

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.47.*$/}})

I don't understand why the time of the treatment is not the same in all these queries.

MongoDB has to gradually scan over more documents as you change your regex (e.g. less documents can be excluded by the query engine based on the index and your regex). Try using explain and I bet that nscanned increases. — Matt
– Matt, Commented Dec 9, 2014 at 16:14
I have done the test using explain and you are true. I understand what you said but i have a question : Why MongoDB needs to scan documents for the count() query ? Indexes are not enough? ; I am searching an efficient way to do a "starts with" query in my multikey index. Maybe have you a solution ? — Masterlud
– Masterlud, Commented Dec 10, 2014 at 8:36

R Day · Accepted Answer · 2014-12-09 16:18:39Z

2

The first regex completes quickly because it uses only explicit characters.

/^uri:obj:id:1258477\.479\.129403280$/

Compare to the other regexes which use greedy wildcards '.*'.

/^uri:obj:id:1258477\.47.*$/

This contains the shortest set of definite characters at the beginning of the string, over many millions of documents there may be many that match the first part.

Try replacing the '.*' with an absolute length or range i.e. '.{0,25}'. It may be quicker yet to replace with a string.beginsWith method if available.

answered Dec 9, 2014 at 16:18

R Day

98210 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Masterlud Over a year ago

No remarkable difference between '.*' and. '.{0,25}' in the treatment time.

R Day Over a year ago

Perhaps try reducing the length of 25 to something more suitable for your data, what is the maximum length you are expecting?

Masterlud Over a year ago

Tested with '.{0,10}' but this is not really better

Collectives™ on Stack Overflow

Count() query in MongoDB with regex filter : slow performance

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related