47

I am unwinding an array using MongoDB aggregation framework and the array has duplicates and I need to ignore those duplicates while doing a grouping further.

How can I achieve that?

1
  • HI andrewrk can you please share schema for above Question and answer Commented Apr 7, 2016 at 12:13

3 Answers 3

53

you can use $addToSet to do this:

db.users.aggregate([
  { $unwind: '$data' },
  { $group: { _id: '$_id', data: { $addToSet: '$data' } } }
]);

It's hard to give you more specific answer without seeing your actual query.

Sign up to request clarification or add additional context in comments.

2 Comments

i thought of that. I can unwind, addtoset and do unwind again. But wanted to see if i can a different way. Thanks.
you can just group by id and array element (without add to set), can't say more without your query
28

You have to use $addToSet, but at first you have to group by _id, because if you don't you'll get an element per item in the list.

Imagine a collection posts with documents like this:

{
     body: "Lorem Ipsum...", 
     tags: ["stuff", "lorem", "lorem"],
     author: "Enrique Coslado"
}

Imagine you want to calculate the most usual tag per author. You'd make an aggregate query like that:

db.posts.aggregate([
    {$project: {
        author: "$author", 
        tags: "$tags", 
        post_id: "$_id"
    }}, 

    {$unwind: "$tags"}, 

    {$group: {
        _id: "$post_id", 
        author: {$first: "$author"}, 
        tags: {$addToSet: "$tags"}
    }}, 

    {$unwind: "$tags"},

    {$group: {
        _id: {
            author: "$author",
            tags: "$tags"
        },
        count: {$sum: 1}
    }}
])

That way you'll get documents like this:

{
     _id: {
         author: "Enrique Coslado", 
         tags: "lorem"
     },
     count: 1
}

1 Comment

Great answer, I would just add a {$sort: {count : -1}} as last stage of the pipeline in order to sort.
3

Previous answers are correct, but the procedure of doing $unwind -> $group -> $unwind could be simplified. You could use $addFields + $reduce to pass to the pipeline the filtered array which already contains unique entries and then $unwind only once.

Example document:

{
     body: "Lorem Ipsum...", 
     tags: [{title: 'test1'}, {title: 'test2'}, {title: 'test1'}, ],
     author: "First Last name"
}

Query:

db.posts.aggregate([
    {$addFields: {
        "uniqueTag": {
            $reduce: {
                input: "$tags",
                initialValue: [],
                in: {$setUnion: ["$$value", ["$$this.title"]]}
            }
        }
    }}, 

    {$unwind: "$uniqueTag"}, 

    {$group: {
        _id: {
            author: "$author",
            tags: "$uniqueTag"
        },
        count: {$sum: 1}
    }}
])

2 Comments

This answers the question I was going to ask about how to remove duplicate array entries in a document field. I knew I could use $unwind/$addToSet but it just feels that the step would duplicate so much data it's going to be slower than just getting everything and doing the aggregation in code.
This is an incredibly expensive operation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.