13

Consider the following example:

db.article.aggregate(
  { $group : {
      _id : "$author",
      docsPerAuthor : { $sum : 1 },
      viewsPerAuthor : { $sum : "$pageViews" }
  }}
);

This groups by the author field and computes two fields.

I have values for $author = FirstName_LastName. Now instead of grouping by $author, I want to group by all authors who share the same LastName.

I tried $regex to group by all matching strings after the '_'

$author.match(/_[a-zA-Z0-9]+$/)

db.article.aggregate(
  { $group : {
      _id : "$author".match(/_[a-zA-Z0-9]+$/),
      docsPerAuthor : { $sum : 1 },
      viewsPerAuthor : { $sum : "$pageViews" }
  }}
);

also tried the following:

 db.article.aggregate(
  { $group : {
      _id : {$author: {$regex: /_[a-zA-Z0-9]+$/}},
      docsPerAuthor : { $sum : 1 },
      viewsPerAuthor : { $sum : "$pageViews" }
  }}
);

4 Answers 4

6

Actually there is no such method which provides this kind of functionality or i could not find the appropriate version which contains it. That will not work with $regexp i think : http://docs.mongodb.org/manual/reference/operator/regex/ it is just for pattern matching.

There is an improvement request in the jira : https://jira.mongodb.org/browse/SERVER-6773

It is in open unresolved state. BUT

in github i found this disscussion: https://github.com/mongodb/mongo/pull/336

And if you check this commit: https://github.com/nleite/mongo/commit/2dd175a5acda86aaad61f5eb9dab83ee19915709

it contains more or less exactly the method you likely to have. I do not really get the point of the state of this improvement: in 2.2.3 it is not working .

Sign up to request clarification or add additional context in comments.

Comments

4

Use mapReduce: it is the general form of aggregation. This is how to proceed in mongo shell: Define the map function

var mapFunction = function() {
  var key = this.author.match(/_[a-zA-Z0-9]+$/)[0];
  var nb_match_bar2 = 0;
  if( this.bar.match(/bar2/g) ){
    nb_match_bar2 = 1;
  }
  var value = {
    docsPerAuthor: 1,
    viewsPerAuthor: Array.sum(this.pageViews)
  };

  emit( key, value );
};

and the reduce function

var reduceFunction = function(key, values) {

  var reducedObject = {
    _id: key,
    docsPerAuthor: 0,
    viewsPerAuthor: 0
  };

  values.forEach( function(value) {
    reducedObject.docsPerAuthor += value.docsPerAuthor;
    reducedObject.viewsPerAuthor += value.viewsPerAuthor;
  }
  );
  return reducedObject;
};

run mapReduce and save the result in map_reduce_result

>db.st.mapReduce(mapFunction, reduceFunction, {out:'map_reduce_result'})

query map_reduce_result to have the result

>db.map_reduce_result.find()

Comments

4

$group combining $addFields and $arrayElemAt works for me (version ≥ 3.4).

Say we have following data in collection faculty, database school:

{ "_id" : ObjectId("5ed5a59b1febc4c796a88e80"), "name" : "Harry_Potter" }
{ "_id" : ObjectId("5ed5a60e1febc4c796a88e81"), "name" : "Edison_Potter" }
{ "_id" : ObjectId("5ed5a6231febc4c796a88e82"), "name" : "Jack_Potter" }
{ "_id" : ObjectId("5ed5a62f1febc4c796a88e83"), "name" : "Alice_Walker" }
{ "_id" : ObjectId("5ed5a65f1febc4c796a88e84"), "name" : "Bob_Walker" }
{ "_id" : ObjectId("5ed5a6731febc4c796a88e85"), "name" : "Will_Smith" }

Following can group each document by the last name:

db.faculty.aggregate([
  {
    $addFields: {
      lastName: {
        $arrayElemAt: [ { $split: ["$name", "_"] }, 1 ]
      }
    }
  },
  {
    $group: {
      _id: "$lastName",
      count: {$sum: 1}
    }
  }
])

Running result is:

{ "_id" : "Potter", "count" : 3 }
{ "_id" : "Walker", "count" : 2 }
{ "_id" : "Smith", "count" : 1 }

The trick I used is to add a field named lastName. Based on what you have for the name field, it can be split into an array by _. Last name is at index 1 and first name at index 0.

Reference

$addFields (aggregation)

$arrayElemAt (aggregation)

Comments

3

A possible workaround with the aggregation framework consists in using $project to compute the author name. However, it is dirty as you need to manually loop through the different first name sizes:

Here, we compute the field name as the substring after the '_' character, trying each of its possible position (this is why there is a chain of $cond), and fallbacking in returning the whole $author if the first name is too long:

http://mongotry.herokuapp.com/#?bookmarkId=52fb5f24a0378802003b4c68

[
{
    "$project": {
        "author": 1,
        "pageViews": 1,
        "name": {
            "$cond": [
                {
                    "$eq": [
                        {
                            "$substr": [
                                "$author",
                                0,
                                1
                            ]
                        },
                        "_"
                    ]
                },
                {
                    "$substr": [
                        "$author",
                        1,
                        999
                    ]
                },
                {
                    "$cond": [
                        {
                            "$eq": [
                                {
                                    "$substr": [
                                        "$author",
                                        1,
                                        1
                                    ]
                                },
                                "_"
                            ]
                        },
                        {
                            "$substr": [
                                "$author",
                                2,
                                999
                            ]
                        },
                        {
                            "$cond": [
                                {
                                    "$eq": [
                                        {
                                            "$substr": [
                                                "$author",
                                                2,
                                                1
                                            ]
                                        },
                                        "_"
                                    ]
                                },
                                {
                                    "$substr": [
                                        "$author",
                                        3,
                                        999
                                    ]
                                },
                                {
                                    "$cond": [
                                        {
                                            "$eq": [
                                                {
                                                    "$substr": [
                                                        "$author",
                                                        3,
                                                        1
                                                    ]
                                                },
                                                "_"
                                            ]
                                        },
                                        {
                                            "$substr": [
                                                "$author",
                                                4,
                                                999
                                            ]
                                        },
                                        {
                                            "$cond": [
                                                {
                                                    "$eq": [
                                                        {
                                                            "$substr": [
                                                                "$author",
                                                                4,
                                                                1
                                                            ]
                                                        },
                                                        "_"
                                                    ]
                                                },
                                                {
                                                    "$substr": [
                                                        "$author",
                                                        5,
                                                        999
                                                    ]
                                                },
                                                "$author"
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    }
},
{
    "$group": {
        "_id": "$name",
        "viewsPerAuthor": {
            "$sum": "$pageViews"
        }
    }
}
]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.