You are largely looking for the $cond operator in order to evaluate conditions and return whether the the particular counter should be incremented or not, but there are also some other aggregation concepts you are missing here:
db.trackings.aggregate([
{ "$match": {
"created_at": { "$gte": startDate, "$lt": endDate },
"country": "US",
"action": "like"
}},
{ "$group": {
"_id": {
"date": {
"month": { "$month": "$created_at" },
"day": { "$dayOfMonth": "$created_at" },
"year": { "$year": "$created_at" }
},
"article_id": "$article_id",
"state": "$state"
},
"male_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count": {
"$sum": {
"$cond": [
{ "$eq": [ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort": {
"_id.date.year": 1,
"_id.date.month": 1,
"_id.date.day": 1,
"_id.article_id": 1,
"_id.state": 1,
"male_like_count": 1,
"female_like_count": 1
}}
]
)
Firstly you basically want to $match, which is how you supply "query" conditions for an aggregation pipeline. It can basically be any pipeline stage, but when used first it will filter the input that is considered in the following operations. In this case, the required date range as well as country, and removal of anything that is not a "like" since you are not worried about those counts.
Then all items are grouped by the respective "key" in _id. This can be and is used as a compound field, mostly because all of these field values are considered part of the grouping key, and also for a little organization.
You also seem to ask in your ouput for "distinct fields" outside of the _id itself. DON'T DO THAT. The data is already there, so there is no point in copying it. You can produce the same things outside of _id via $first as an aggregation operator, or you could even use a $project stage at the end of the pipeline to rename the fields. But it's really best that you loose the habit that you think you need that, as it just costs time and or space in getting a response.
If anything though, you seem to be after a "pretty date" more than anything else. I personally prefer working with "date math" for most manipulation, and therefore an altered listing suitable for mongoid would be:
Tracking.collection.aggregate([
{ "$match" => {
"created_at" => { "$gte" => startDate, "$lt" => endDate },
"country" => "US",
"action" => "like"
}},
{ "$group" => {
"_id" => {
"date" => {
"$add" => [
{ "$subtract" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
{ "$mod" => [
{ "$subtract" => [ "$created_at", Time.at(0).utc.to_datetime ] },
1000 * 60 * 60 * 24
]}
]},
Time.at(0).utc.to_datetime
]
},
"article_id" => "$article_id",
"state" => "$state"
},
"male_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "male" ] }
1,
0
]
}
},
"female_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" => [ "$gender", "female" ] }
1,
0
]
}
},
"unknown_like_count" => {
"$sum" => {
"$cond" => [
{ "$eq" =>[ "$gender", "unknown" ] }
1,
0
]
}
}
}},
{ "$sort" => {
"_id.date" => 1,
"_id.article_id" => 1,
"_id.state" => 1,
"male_like_count" => 1,
"female_like_count" => 1
}}
])
Which really just comes down to getting a DateTime object suitable for use as a driver argument that corresponds to the epoch date and working the various operations. Where processing $subtract with one BSON Date and another will produce a numeric value that can be subsequently be rounded to the current day using the applied math. Then of course when using $add with a numeric timestamp value to a BSON Date ( again representing epoch ) then the result is again a BSON Date object, with of course the adjusted and rounded value.
Then it's all just a matter of applying $sort as an aggregation pipeline stage again, as oppposed to an external modifier. Much like the $match principle, an aggregation pipeline can sort anywhere, but at the end is always dealing with the final result.