Mongodb querying for aggregation with count of multiple values

Question

I am using Mongoid in one of my rails app to for mongodb

class Tracking
  include Mongoid::Document
  include Mongoid::Timestamps

  field :article_id,      type: String
  field :action,          type: String # like | comment
  field :actor_gender,    type: String # male | female | unknown

  field :city,            type: String
  field :state,           type: String
  field :country,         type: String
end

Here I want to grab the record in this tabular format,

article_id | state | male_like_count | female_like_count | unknown_gender_like_count | date

juhkwu2367 | California | 21 | 7  | 1 | 11-20-2015
juhkwu2367 | New York   | 62 | 23 | 3 | 11-20-2015
juhkwu2367 | Vermont    | 48 | 27 | 3 | 11-20-2015
juhkwu2367 | California | 21 | 7  | 1 | 11-21-2015
juhkwu2367 | New York   | 62 | 23 | 3 | 11-21-2015
juhkwu2367 | Vermont    | 48 | 27 | 3 | 11-21-2015

Here the input for the query would be:

article_id 
country
date range (from and to)
action (is `like` in this scenario)
sort_by [ date | state | male_like_count | female_like_count ]

This is what I am trying, by referring an example at https://docs.mongodb.org/v3.0/reference/operator/aggregation/group/

db.trackings.aggregate(
   [
      {
        $group : {
           _id : { month: { $month: "$created_at" }, day: { $dayOfMonth: "$created_at" }, year: { $year: "$created_at" }, article_id:  "$article_id", state: "$state", country: "$country"},
           article_id: "$article_id",
           country: ??,
           state: "$state",
           male_like_count: { $sum:  ?? } },
           female_like_count: { $sum:  ?? } },
           unknown_gender_like_count: { $sum:  ?? } },
           date: ??
        }
      }
   ]
)

So what should I put at the place of ?? for comparing the count by gender and how to add clause for sorting_option?

Blakes Seven · Accepted Answer · 2015-11-23 01:44:01Z

You are largely looking for the $cond operator in order to evaluate conditions and return whether the the particular counter should be incremented or not, but there are also some other aggregation concepts you are missing here:

db.trackings.aggregate([
    { "$match": {
        "created_at": { "$gte": startDate, "$lt": endDate },
        "country": "US",
        "action": "like"
    }},
    { "$group": {
        "_id": { 
            "date": {
                "month": { "$month": "$created_at" }, 
                "day": { "$dayOfMonth": "$created_at" },
                "year": { "$year": "$created_at" }
            },
            "article_id":  "$article_id", 
            "state": "$state"
        },
        "male_like_count": { 
            "$sum": {
                "$cond": [
                    { "$eq": [ "$gender", "male" ] }                            
                    1,
                    0
                ]
            }
        },
        "female_like_count": { 
            "$sum": {
                "$cond": [
                    { "$eq": [ "$gender", "female" ] }                            
                    1,
                    0
                ]
            }
        },
        "unknown_like_count": { 
            "$sum": {
                "$cond": [
                    { "$eq": [ "$gender", "unknown" ] }                            
                    1,
                    0
                ]
            }
        }
      }},
      { "$sort": {
        "_id.date.year": 1,
        "_id.date.month": 1,
        "_id.date.day": 1,
        "_id.article_id": 1,
        "_id.state": 1,
        "male_like_count": 1,
        "female_like_count": 1
      }}
   ]
)

Firstly you basically want to $match, which is how you supply "query" conditions for an aggregation pipeline. It can basically be any pipeline stage, but when used first it will filter the input that is considered in the following operations. In this case, the required date range as well as country, and removal of anything that is not a "like" since you are not worried about those counts.

Then all items are grouped by the respective "key" in _id. This can be and is used as a compound field, mostly because all of these field values are considered part of the grouping key, and also for a little organization.

You also seem to ask in your ouput for "distinct fields" outside of the _id itself. DON'T DO THAT. The data is already there, so there is no point in copying it. You can produce the same things outside of _id via $first as an aggregation operator, or you could even use a $project stage at the end of the pipeline to rename the fields. But it's really best that you loose the habit that you think you need that, as it just costs time and or space in getting a response.

If anything though, you seem to be after a "pretty date" more than anything else. I personally prefer working with "date math" for most manipulation, and therefore an altered listing suitable for mongoid would be:

Tracking.collection.aggregate([
    { "$match" => {
        "created_at" => { "$gte" => startDate, "$lt" => endDate },
        "country" => "US",
        "action" => "like"
    }},
    { "$group" => {
        "_id" => { 
            "date" => {
                "$add" => [
                    { "$subtract" => [
                        { "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
                        { "$mod" => [
                            { "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
                            1000 * 60 * 60 * 24
                        ]}
                    ]},
                    Time.at(0).utc.to_datetime
                ]
            },
            "article_id" =>  "$article_id", 
            "state" => "$state"
        },
        "male_like_count" => { 
            "$sum" => {
                "$cond" => [
                    { "$eq" => [ "$gender", "male" ] }                            
                    1,
                    0
                ]
            }
        },
        "female_like_count" => { 
            "$sum" => {
                "$cond" => [
                    { "$eq" => [ "$gender", "female" ] }                            
                    1,
                    0
                ]
            }
        },
        "unknown_like_count" => { 
            "$sum" => {
                "$cond" => [
                    { "$eq" =>[ "$gender", "unknown" ] }                            
                    1,
                    0
                ]
            }
        }
      }},
      { "$sort" => {
        "_id.date" => 1,
        "_id.article_id" => 1,
        "_id.state" => 1,
        "male_like_count" => 1,
        "female_like_count" => 1
      }}
])

Which really just comes down to getting a DateTime object suitable for use as a driver argument that corresponds to the epoch date and working the various operations. Where processing $subtract with one BSON Date and another will produce a numeric value that can be subsequently be rounded to the current day using the applied math. Then of course when using $add with a numeric timestamp value to a BSON Date ( again representing epoch ) then the result is again a BSON Date object, with of course the adjusted and rounded value.

Then it's all just a matter of applying $sort as an aggregation pipeline stage again, as oppposed to an external modifier. Much like the $match principle, an aggregation pipeline can sort anywhere, but at the end is always dealing with the final result.

Never imagined that someone will post an answer so nicely. Thank you so much @blakes Thanks for putting solution for pretty dates too. I have two questions - (1) What is the use of having _id.article_id in sort options? (2) I believe sort options work in the order of top to bottom, that means, first it will sort by date, then sort by state and then male_like_count and female like count? right. But if I don't need that level of sorting, then passing only desired key should be fine?

Collectives™ on Stack Overflow

Mongodb querying for aggregation with count of multiple values

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related