How to use nested grouping in MongoDB

Question

I need to find total count of duplicate profiles per organization level. I have documents as shown below:

{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "75"
    }
    "_id" : "1"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "75"
    }
    "_id" : "2"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "77"
    }
    "_id" : "3"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "77"
    }
    "_id" : "4"
}

I have written query which is a group by ProfileId and OrganizationId. The results i am getting as shown below:

Organization    Total
10               2
10               2

But i want to get the sum of total per organization level, that means Org 10 should have one row with sum of 4.

The query i am using as shown below:

 db.getSiblingDB("dbName").OrgProfile.aggregate(
 { $project: { _id: 1, P: "$Profile._id",  O: "$OrganizationId" } },
 { $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} },
 { $match: { c: { $gt: 1 } } });

Any ideas ? Please help

Your query actually returns the correct result: { "_id" : { "p" : "75", "o" : 10 }, "c" : 4 } — Ori Popowski
– Ori Popowski, Commented Sep 17, 2016 at 20:42
Thanks for your reply. This query returns multiple records for the same organization for that again i have to count the sum of total manually. — Srinivas
– Srinivas, Commented Sep 18, 2016 at 7:02
@Srinivas Please read through your question again as you state in your comments that you want a sum of 2 for 10 but in your question you mention "that means Org 10 should have one row with sum of 4." - both statements doesn't match — DAXaholic
– DAXaholic, Commented Sep 18, 2016 at 7:47
@DAXaholic Thanks for pointing this one: This is the output: {"_id" : { "p" : "77", "o" : 10 } ], "o" : [ 10, 10 ], "c" : 2 }, { "_id" : { "p" : "75", "o" : 10 } ], "o" : [ 10, 10 ], "c" : 2 } but i want single row for org 10 with sum of 4 — Srinivas
– Srinivas, Commented Sep 18, 2016 at 8:55

DAXaholic · Accepted Answer · 2016-09-18 09:44:19Z

3

The following pipeline should give you the desired output, whereas the last $project stage is just for cosmetic purposes to turn _id into OrganizationId but is not needed for the essential computation so you may omit it.

db.getCollection('yourCollection').aggregate([
    { 
        $group: {  
            _id: { org: "$OrganizationId", profile: "$Profile._id" },
            count: { $sum: 1 }
        }
    },
    {
        $group: {
            _id: "$_id.org",
            Total: { 
                $sum: { 
                    $cond: { 
                        if: { $gte: ["$count", 2] }, 
                        then: "$count", 
                        else: 0
                    }
                }
            }
        } 
     },
     {
         $project: {
             _id: 0,
             Organization: "$_id",
             Total: 1
         }
     }
])

gives this output

{
    "Total" : 4.0,
    "Organization" : 10
}

To filter out organizations without duplicates you can use $match which will also result in a simplification of the second $group stage

...aggregate([
    { 
        $group: {  
            _id: { org: "$OrganizationId", profile: "$Profile._id" },
            count: { $sum: 1 }
        }
    },
    {
        $match: {
            count: { $gte: 2 } 
        }
    },
    {
        $group: {
            _id: "$_id.org",
            Total: { $sum: "$count" }
        } 
     },
     {
         $project: {
             _id: 0,
             Organization: "$_id",
             Total: 1
         }
     }
])

edited Sep 18, 2016 at 9:44

answered Sep 18, 2016 at 9:14

DAXaholic

35.9k6 gold badges84 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Srinivas Over a year ago

Thanks @DAXaholic, I am getting the results as expected, only one doubt i have is it possible to filter the org which have 0 duplicates.

Srinivas Over a year ago

I have modified the $cond since it is expecting in array format

DAXaholic Over a year ago

Well I guess I use a newer version than you which allows the if/then/else properties. Regarding the filtering I updated my answer - hope that helps

Srinivas Over a year ago

Ok got it. Its giving the filtered results. Thanks again :)

DAXaholic Over a year ago

Nice to hear it helped :)

Hayden Braxton · Accepted Answer · 2016-09-17 21:02:24Z

0

I think I have a solution for you. In that last step there, instead of matching, I think you want another $group.

    .aggregate([

     { $project: { _id: 1, P: "$Profile._id",  O: "$OrganizationId" } }
     ,{ $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} }
     ,{ $group: { _id: "$_id.o" , c: {  $sum: "$c" } }}

     ]);

You can probably read it and figure out yourself what's happening in that last step, but just in case I'll explain. the last step is group all documents that have the same organization id, and then summing the quantity specified by the previous c field. After the first group, you had two documents that both had a count c of 2 but different profile id. The next group ignores the profile id and just groups them if they have the same organization id and adds their counts.

When I ran this query, here is my result, which is what I think you're looking for:

{
    "_id" : 10,
    "c" : 4
}

Hope this helps. Let me know if you have any questions.

edited Sep 17, 2016 at 21:02

answered Sep 17, 2016 at 20:53

Hayden Braxton

1,16110 silver badges14 bronze badges

1 Comment

Srinivas Over a year ago

Thanks for your reply. i tried to execute this query but it returns each organization total profile count, not the duplicated profiles length.

Collectives™ on Stack Overflow

How to use nested grouping in MongoDB

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related