1

I need to find total count of duplicate profiles per organization level. I have documents as shown below:

{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "75"
    }
    "_id" : "1"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "75"
    }
    "_id" : "2"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "77"
    }
    "_id" : "3"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "77"
    }
    "_id" : "4"
}

I have written query which is a group by ProfileId and OrganizationId. The results i am getting as shown below:

Organization    Total
10               2
10               2

But i want to get the sum of total per organization level, that means Org 10 should have one row with sum of 4.

The query i am using as shown below:

 db.getSiblingDB("dbName").OrgProfile.aggregate(
 { $project: { _id: 1, P: "$Profile._id",  O: "$OrganizationId" } },
 { $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} },
 { $match: { c: { $gt: 1 } } });

Any ideas ? Please help

4
  • Your query actually returns the correct result: { "_id" : { "p" : "75", "o" : 10 }, "c" : 4 } Commented Sep 17, 2016 at 20:42
  • Thanks for your reply. This query returns multiple records for the same organization for that again i have to count the sum of total manually. Commented Sep 18, 2016 at 7:02
  • @Srinivas Please read through your question again as you state in your comments that you want a sum of 2 for 10 but in your question you mention "that means Org 10 should have one row with sum of 4." - both statements doesn't match Commented Sep 18, 2016 at 7:47
  • @DAXaholic Thanks for pointing this one: This is the output: {"_id" : { "p" : "77", "o" : 10 } ], "o" : [ 10, 10 ], "c" : 2 }, { "_id" : { "p" : "75", "o" : 10 } ], "o" : [ 10, 10 ], "c" : 2 } but i want single row for org 10 with sum of 4 Commented Sep 18, 2016 at 8:55

2 Answers 2

3

The following pipeline should give you the desired output, whereas the last $project stage is just for cosmetic purposes to turn _id into OrganizationId but is not needed for the essential computation so you may omit it.

db.getCollection('yourCollection').aggregate([
    { 
        $group: {  
            _id: { org: "$OrganizationId", profile: "$Profile._id" },
            count: { $sum: 1 }
        }
    },
    {
        $group: {
            _id: "$_id.org",
            Total: { 
                $sum: { 
                    $cond: { 
                        if: { $gte: ["$count", 2] }, 
                        then: "$count", 
                        else: 0
                    }
                }
            }
        } 
     },
     {
         $project: {
             _id: 0,
             Organization: "$_id",
             Total: 1
         }
     }
])

gives this output

{
    "Total" : 4.0,
    "Organization" : 10
}

To filter out organizations without duplicates you can use $match which will also result in a simplification of the second $group stage

...aggregate([
    { 
        $group: {  
            _id: { org: "$OrganizationId", profile: "$Profile._id" },
            count: { $sum: 1 }
        }
    },
    {
        $match: {
            count: { $gte: 2 } 
        }
    },
    {
        $group: {
            _id: "$_id.org",
            Total: { $sum: "$count" }
        } 
     },
     {
         $project: {
             _id: 0,
             Organization: "$_id",
             Total: 1
         }
     }
])
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks @DAXaholic, I am getting the results as expected, only one doubt i have is it possible to filter the org which have 0 duplicates.
I have modified the $cond since it is expecting in array format
Well I guess I use a newer version than you which allows the if/then/else properties. Regarding the filtering I updated my answer - hope that helps
Ok got it. Its giving the filtered results. Thanks again :)
Nice to hear it helped :)
0

I think I have a solution for you. In that last step there, instead of matching, I think you want another $group.

    .aggregate([

     { $project: { _id: 1, P: "$Profile._id",  O: "$OrganizationId" } }
     ,{ $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} }
     ,{ $group: { _id: "$_id.o" , c: {  $sum: "$c" } }}

     ]);

You can probably read it and figure out yourself what's happening in that last step, but just in case I'll explain. the last step is group all documents that have the same organization id, and then summing the quantity specified by the previous c field. After the first group, you had two documents that both had a count c of 2 but different profile id. The next group ignores the profile id and just groups them if they have the same organization id and adds their counts.

When I ran this query, here is my result, which is what I think you're looking for:

{
    "_id" : 10,
    "c" : 4
}

Hope this helps. Let me know if you have any questions.

1 Comment

Thanks for your reply. i tried to execute this query but it returns each organization total profile count, not the duplicated profiles length.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.