1

I'm running the standard Homebrew installation of Mongo DB, version 2.4.6, and I've got a database with a collection called 'items', which has 600k documents within it.

I've written the following query to find the the top five brands for the collection of items:

db.items.aggregate([
  { $group: { _id: '$brand', size: { $sum: 1}}}, 
  { $sort:  {"size": -1}},
  { $limit: 5}
])

which returns the result I expected, but to be frank, takes much longer to complete than I ever would have imagined. Here is the profile data:

{
    "op" : "command",
    "ns" : "insights-development.$cmd",
    "command" : {
            "aggregate" : "items",
            "pipeline" : [
                    {
                            "$group" : {
                                    "_id" : "$brand",
                                    "size" : {
                                            "$sum" : 1
                                    }
                            }
                    },
                    {
                            "$sort" : {
                                    "size" : -1
                            }
                    },
                    {
                            "$limit" : 5
                    }
            ]
    },
    "ntoreturn" : 1,
    "keyUpdates" : 0,
    "numYield" : 3,
    "lockStats" : {
            "timeLockedMicros" : {
                    "r" : NumberLong(3581974),
                    "w" : NumberLong(0)
            },
            "timeAcquiringMicros" : {
                    "r" : NumberLong(1314151),
                    "w" : NumberLong(10)
            }
    },
    "responseLength" : 267,
    "millis" : 2275,
    "ts" : ISODate("2013-11-23T18:16:33.886Z"),
    "client" : "127.0.0.1",
    "allUsers" : [ ],
    "user" : ""
}

Here is the ouptut of db.items.stats():

{
    "sharded" : false,
    "primary" : "a59aff30810b066bbe31d1fae79596af",
    "ns" : "insights-development.items",
    "count" : 640590,
    "size" : 454491840,
    "avgObjSize" : 709.4894394230319,
    "storageSize" : 576061440,
    "numExtents" : 14,
    "nindexes" : 10,
    "lastExtentSize" : 156225536,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 165923744,
    "indexSizes" : {
            "_id_" : 17889088,
            "demographic_1" : 14741328,
            "brand_1" : 17946320,
            "retailer_1" : 18690336,
            "color_1" : 15738800,
            "style_1" : 18951968,
            "classification_1" : 15019312,
            "placement_1" : 19107312,
            "state_1" : 12394816,
            "gender_1" : 15444464
    },
    "ok" : 1
}

I'm fairly new to MongoDB so I'm hoping someone can point out why this aggregation takes so long to run and if there is anything I can do to speed it up as it seems to me that 600k isn't a huge number of documents more Mongo to run calculations on.

4
  • How many unique brands do you have? Commented Nov 23, 2013 at 22:26
  • @WiredPrairie ~26k unique brands. Commented Nov 23, 2013 at 22:41
  • 1
    db.serverStatus isn't that useful here, how about db.items.stats()? Commented Nov 23, 2013 at 22:45
  • @AsyaKamsky Ok, updated for your request. Commented Nov 23, 2013 at 23:05

1 Answer 1

2

If you have an index on "brand" field, then adding a {$sort:{brand:1}} at the beginning of the pipeline may help performance. The reason you're not seeing good performance right now is likely due to the need to scan every document to group by brand. If there was an index, then it could be used to scan index only rather than all the documents. And sorting (which uses an index) can speed up grouping in some cases where having a result ordered by the field being grouped is beneficial.

If you created an index on brand and didn't see any improvement, try adding a $sort before you get rid of the index. If it happens that you already have an index where brand is the first field, you then don't need to add another index on brand - the compound index will automatically be used.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.