Increase Aggregation Framework Performance in MongoDb

Question

I have 200k records in my collection. My data model looks like as follows:

{
    "_id" : ObjectId("51750ec159dcef125863b7c4"),
    "DateAdded" : ISODate("2013-04-22T00:00:00.000Z"),
    "DateRemoved" : ISODate("2013-12-22T00:00:00.000Z"),
    "DealerID" : ObjectId("51750bd559dcef07ec964a41"),
    "ExStockID" : "8324482",
    "Make" : "Mazda",
    "Model" : "3",
    "Price" : 11479,
    "Year" : 2012,
    "Variant" : "1.6d (115) TS 5dr",
    "Turnover": 150
}

I have several indexes for the collection, one of those used for aggregation framework is:

{
    "DealerID" : 1,
    "DateRemoved" : -1,
    "Price" : 1,
    "Turnover" : 1
}

The aggregate query which is being used:

db.stats.aggregate([
{
    "$match": {
        "DealerID": {
            "$in": [
                ObjectId("523325ac59dcef1b90a3d446"),
                ....
                // here is specified more than 150 ObjectIds
            ]
        },
        "DateRemoved": {
            "$gte": ISODate("2013-12-01T00:00:00Z"),
            "$lt": ISODate("2014-01-01T00:00:00Z")
        }
    }
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
    "$group": {
        "_id": null,
        "Price": {
            "$avg": "$Price"
        },
        "Turnover": {
            "$avg": "$Turnover"
        }
    }
}]);

and the time for this query executions resides between 30-200 seconds.

How can I optimize this?

just a test here, if you remove the $avg operators is it fast? — Sammaye
– Sammaye, Commented Jan 7, 2014 at 13:53
Unfortunately you can't use .explain() on aggregation. But you could execute the $match as a normal collection.find followed by .explain() to get a hopefully helpful explain output telling you if the index is effective. — Philipp
– Philipp, Commented Jan 7, 2014 at 13:54
$project and $group don't benefit from indexes, so when you created that index especially for this query, it would likely work better (or at least just as well) without Price and Turnover. — Philipp
– Philipp, Commented Jan 7, 2014 at 13:55
@Philipp you can actually use explain, it just doesn't work across sharding — Sammaye
– Sammaye, Commented Jan 7, 2014 at 13:57
@Philipp With no intention to argue, just wonder why $project donesn't benefit from the index ? Looks like Miro was trying to create a covering index here. $project should use the index to avoid lookups to the actual documents — Maksym Strukov
– Maksym Strukov, Commented Jan 7, 2014 at 14:04

Derick · Accepted Answer · 2014-01-07 15:22:50Z

You can try to run explain on the aggregation pipeline, but as I don't have your full dataset, I can't try it out properly:

p = [
{
    "$match": {
        "DealerID": {
            "$in": [
                ObjectId("51750bd559dcef07ec964a41"),
                ObjectId("51750bd559dcef07ec964a44"),
            ]
        },
        "DateRemoved": {
            "$gte": ISODate("2013-12-01T00:00:00Z"),
            "$lt": ISODate("2014-01-01T00:00:00Z")
        }
    }
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
    "$group": {
        "_id": null,
        "Price": {
            "$avg": "$Price"
        },
        "Turnover": {
            "$avg": "$Turnover"
        }
    }
}];

db.s.runCommand('aggregate', { pipeline: p, explain: true } );

I would suggest to remove the fields that are not part of the $match (Price and Turnover). Also, I think you should switch the order of DealerId and DateRemoved as you want to do one range search, and from that range then include all the dealers. Doing it the other way around means that you can really only use the index for the 150 single items, and then you need to do a range search.

Mirodil · Accepted Answer · 2014-01-08 09:10:47Z

Using @Derick's answer I have found the index which prevented to create the covered index. As far as I can see query optimizer uses the first index which covers just the query itself, so I have changed the order of indexes. So here is outcome before and after.

Before:

{
    "serverPipeline" : [ 
        {
            "query" : {...},
            "projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
            "cursor" : {
                "cursor" : "BtreeCursor DealerIDDateRemoved multi",
                "isMultiKey" : false,
                "n" : 11036,
                "nscannedObjects" : 11008,
                "nscanned" : 11307,
                "nscannedObjectsAllPlans" : 11201,
                "nscannedAllPlans" : 11713,
                "scanAndOrder" : false,
                "indexOnly" : false,
                "nYields" : 0,
                "nChunkSkips" : 0,
                "millis" : 58,
                "indexBounds" : {...},
                "allPlans" : [...],
                "oldPlan" : {...},
                "server" : "..."
            }
        }, 
        {
            "$group" : {...}
        }
    ],
    "ok" : 1
}

After these changes indexOnly param now shows true, this means we have just created the covered index:

{
    "serverPipeline" : [ 
        {
            "query" : {...},
            "projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
            "cursor" : {
                "cursor" : "BtreeCursor DealerIDDateRemovedPriceTurnover multi",
                "isMultiKey" : false,
                "n" : 11036,
                "nscannedObjects" : 0,
                "nscanned" : 11307,
                "nscannedObjectsAllPlans" : 285,
                "nscannedAllPlans" : 11713,
                "scanAndOrder" : false,
                "indexOnly" : true,
                "nYields" : 0,
                "nChunkSkips" : 0,
                "millis" : 58,
                "indexBounds" : {...},
                "allPlans" : [...],
                "server" : "..."
            }
        }, 
        {
            "$group" : {...}
    ],
    "ok" : 1
}

Now the query works approximately between 0.085-0.300 seconds. Additional information about covered queries Create Indexes that Support Covered Queries

Collectives™ on Stack Overflow

Increase Aggregation Framework Performance in MongoDb

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related