9

I'm stuck with this simple prefix query. Although Mongo docs state that you can get pretty good performance by using the prefix regex format (/^a/), the query is pretty slow when I try to sort the results:

940 millis

db.posts.find({hashtags: /^noticias/ }).limit(15).sort({rank : -1}).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1 multi",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 142691,
"nscanned" : 142692,
"nscannedObjectsAllPlans" : 142691,
"nscannedAllPlans" : 142692,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 934,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticiat"
        ],
        [
            /^noticias/,
            /^noticias/
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"
}

However, the unsorted version of the same query is super fast:

0 millis

db.posts.find({hashtags: /^noticias/ }).limit(15).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1 multi",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 15,
"nscanned" : 15,
"nscannedObjectsAllPlans" : 15,
"nscannedAllPlans" : 15,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticiat"
        ],
        [
            /^noticias/,
            /^noticias/
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"

}

The query is also fast if I remove the regex and sort:

0 millis

db.posts.find({hashtags: 'noticias' }).limit(15).sort({rank : -1}).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 15,
"nscanned" : 15,
"nscannedObjectsAllPlans" : 15,
"nscannedAllPlans" : 15,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticias"
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"

}

It seems like using both regex and sort makes Mongo scan lots of records. However, sort is scanning just 15 if I don't use the regex. What's wrong here?

1
  • 1
    jaime, I believe the scanAndOrder is responsible for the slowness. You may want to look at Andre's answer which might be similar if not exactly the same as your issue. Commented Oct 23, 2012 at 16:22

1 Answer 1

8

The scanAndOrder: true in the explain output indicates that the query is having to retrieve the documents and then sort them in memory before the output is returned. This is an expensive operation, and will be having an impact on the performance of your query.

The existence of scanAndOrder: true as well as the difference in nscanned an n in the explain output indicates that the query is not using an optimal index. In this case it appears to be needing to do a collection scan. You might be able to alleviate this issue by including the index keys in your sort criteria. From my testing:

db.posts.find({hashtags: /^noticias/ }).limit(15).sort({hashtags:1, rank : -1}).explain()

Does not require a scan and order, and returns n and nscanned of the number of records you are looking for. This would also mean sorting on the hashtags key, which may or may not be useful to you, but should increase the performance of the query.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that really did the trick. Also worth noting that index bounds are inferred from first regular expression, in case you are adding multiple. For example, this query: db.posts.find({"$and" : [{hashtags: /^manana/}, {hashtags: /^noticias/}]}).sort({'hashtags':1,'rank':-1}).limit(10).explain() will have index bounds on manana, and that might have difference performance you. In my case, sorting regular expression queries alphabetically yielded better performance
A, Great answer you're saving lives even after 9 years of posting

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.