Mongoose / MongoDB: count elements in array

Question

I'm trying to count the number of occurrences of a string in an array in my collection using Mongoose. My "schema" looks like this:

var ThingSchema = new Schema({
  tokens: [ String ]
});

My objective is to get the top 10 "tokens" in the "Thing" collection, which can contain multiple values per document. For example:

var documentOne = {
    _id: ObjectId('50ff1299a6177ef9160007fa')
  , tokens: [ 'foo' ]
}

var documentTwo = {
    _id: ObjectId('50ff1299a6177ef9160007fb')
  , tokens: [ 'foo', 'bar' ]
}

var documentThree = {
    _id: ObjectId('50ff1299a6177ef9160007fc')
  , tokens: [ 'foo', 'bar', 'baz' ]
}

var documentFour = {
    _id: ObjectId('50ff1299a6177ef9160007fd')
  , tokens: [ 'foo', 'baz' ]
}

...would give me data result:

[ foo: 4, bar: 2 baz: 2 ]

I'm considering using MapReduce and Aggregate for this tool, but I'm not certain what is the best option.

Use aggregate unless you want the results persisted in their own collection. You'll want to look at the $unwind operator for this. — JohnnyHK
– JohnnyHK, Commented Jan 31, 2013 at 2:29
So far, Mongoose's mapReduce class has added the temporary operator to the query, allowing the resultset to be returned rather than persisted. Is there a reason beyond that that I'd want to use aggregate instead? — Eric Martindale
– Eric Martindale, Commented Jan 31, 2013 at 2:39
The aggregation framework was written precisely for handling queries like this (over map-reduce). How much more performant it is I couldn't say, but higher performance and lower complexity for aggregation queries was the point. Aggregation uses C++, while map-reduce uses (less performant) JavaScript See the slideshow — numbers1311407
– numbers1311407, Commented Jan 31, 2013 at 2:54

Eric Martindale · Accepted Answer · 2013-02-05 19:58:14Z

Aha, I've found the solution. MongoDB's aggregate framework allows us to execute a series of tasks on a collection. Of particular note is $unwind, which breaks an array in a document into unique documents, so they can be groups / counted en masse.

MongooseJS exposes this very accessibly on a model. Using the example above, this looks as follows:

Thing.aggregate([
    { $match: { /* Query can go here, if you want to filter results. */ } } 
  , { $project: { tokens: 1 } } /* select the tokens field as something we want to "send" to the next command in the chain */
  , { $unwind: '$tokens' } /* this converts arrays into unique documents for counting */
  , { $group: { /* execute 'grouping' */
          _id: { token: '$tokens' } /* using the 'token' value as the _id */
        , count: { $sum: 1 } /* create a sum value */
      }
    }
], function(err, topTopics) {
  console.log(topTopics);
  // [ foo: 4, bar: 2 baz: 2 ]
});

It is noticeably faster than MapReduce in preliminary tests across ~200,000 records, and thus likely scales better, but this is only after a cursory glance. YMMV.

Collectives™ on Stack Overflow

Mongoose / MongoDB: count elements in array

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related