10

I'm trying to count the number of occurrences of a string in an array in my collection using Mongoose. My "schema" looks like this:

var ThingSchema = new Schema({
  tokens: [ String ]
});

My objective is to get the top 10 "tokens" in the "Thing" collection, which can contain multiple values per document. For example:

var documentOne = {
    _id: ObjectId('50ff1299a6177ef9160007fa')
  , tokens: [ 'foo' ]
}

var documentTwo = {
    _id: ObjectId('50ff1299a6177ef9160007fb')
  , tokens: [ 'foo', 'bar' ]
}

var documentThree = {
    _id: ObjectId('50ff1299a6177ef9160007fc')
  , tokens: [ 'foo', 'bar', 'baz' ]
}

var documentFour = {
    _id: ObjectId('50ff1299a6177ef9160007fd')
  , tokens: [ 'foo', 'baz' ]
}

...would give me data result:

[ foo: 4, bar: 2 baz: 2 ]

I'm considering using MapReduce and Aggregate for this tool, but I'm not certain what is the best option.

4
  • Use aggregate unless you want the results persisted in their own collection. You'll want to look at the $unwind operator for this. Commented Jan 31, 2013 at 2:29
  • So far, Mongoose's mapReduce class has added the temporary operator to the query, allowing the resultset to be returned rather than persisted. Is there a reason beyond that that I'd want to use aggregate instead? Commented Jan 31, 2013 at 2:39
  • 1
    aggregate is often dramatically faster. Commented Jan 31, 2013 at 2:53
  • 1
    The aggregation framework was written precisely for handling queries like this (over map-reduce). How much more performant it is I couldn't say, but higher performance and lower complexity for aggregation queries was the point. Aggregation uses C++, while map-reduce uses (less performant) JavaScript See the slideshow Commented Jan 31, 2013 at 2:54

1 Answer 1

25

Aha, I've found the solution. MongoDB's aggregate framework allows us to execute a series of tasks on a collection. Of particular note is $unwind, which breaks an array in a document into unique documents, so they can be groups / counted en masse.

MongooseJS exposes this very accessibly on a model. Using the example above, this looks as follows:

Thing.aggregate([
    { $match: { /* Query can go here, if you want to filter results. */ } } 
  , { $project: { tokens: 1 } } /* select the tokens field as something we want to "send" to the next command in the chain */
  , { $unwind: '$tokens' } /* this converts arrays into unique documents for counting */
  , { $group: { /* execute 'grouping' */
          _id: { token: '$tokens' } /* using the 'token' value as the _id */
        , count: { $sum: 1 } /* create a sum value */
      }
    }
], function(err, topTopics) {
  console.log(topTopics);
  // [ foo: 4, bar: 2 baz: 2 ]
});

It is noticeably faster than MapReduce in preliminary tests across ~200,000 records, and thus likely scales better, but this is only after a cursory glance. YMMV.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.