1

I have below users collection

[{
    "_id": 1,
    "adds": ["111", "222", "333", "111"]
}, {
    "_id": 2,
    "adds": ["555", "666", "777", "555"]
}, {
    "_id": 3,
    "adds": ["888", "999", "000", "888"]
}]

I need to find the duplicates inside the adds array

The expected output should be

[{
    "_id": 1,
    "adds": ["111"]
}, {
    "_id": 2,
    "adds": [ "555"]
}, {
    "_id": 3,
    "adds": ["888"]
}]

I have tried using many operators $setUnion, $setDifference but none of the did the trick.

Please help!!!

6
  • Why are you trying to avoid $unwind? An $unwind/$group/$match/$project would be a straightforward approach to compare with the answers posted so far. Also, what specific version of MongoDB server are you using? Commented Dec 10, 2018 at 1:40
  • @Stennie I am using mongodb 4.0 the latest one. Because $unwind makes some performance issue. Find duplicates inside an array is not a big deal. There should be some aggregation operator for that? Isn't it? BTW thanks for the reply. Commented Dec 10, 2018 at 5:04
  • @Stennie Any comments? ;-) Commented Dec 10, 2018 at 6:57
  • As at MongoDB 4.0 there isn't a shorthand aggregation operator for filtering an array to only find duplicates. There are several different approaches to achieve this outcome using existing operators, but you'd have to benchmark to compare performance for your use case. If this is a common need you might want to consider adjusting your data model to make it more efficient to query. For example, instead of having an array of values you could have an array of objects with counts which you increment when adding to the array: [{"111": 2, "222": 1, "333": 1}]. Commented Dec 28, 2018 at 23:53
  • You could also raise a feature suggestion for a new operator in the MongoDB Jira issue tracker (project: "SERVER", component: "Aggregation Framework"). That's unlikely to help you in the short term, but if others are interested in the same feature it might land in a future release of MongoDB. Commented Dec 28, 2018 at 23:53

3 Answers 3

2

You can use $range to generate arrays of numbers from 1 to n where n is the $size of adds. Then you can "loop" through that numbers and check if adds at index ($arrayElemAt) exists somewhere before index if yes then it should be considered as a duplicate. You can use $indexOfArray to check if element exists in array specifying 0 and index as search range.

Then you just need to use $project and $map to replace indexes with actual elements. You can also add $setUnion to avoid duplicated duplicates in final result set.

db.users.aggregate([
    {
        $addFields: {
            duplicates: {
                $filter: {
                    input: { $range: [ 1, { $size: "$adds" } ] },
                    as: "index",
                    cond: {
                        $ne: [ { $indexOfArray: [ "$adds", { $arrayElemAt: [ "$adds", "$$index" ]  }, 0, "$$index" ] }, -1 ]
                    }
                }
            }
        }
    },
    {
        $project: {
            _id: 1,
            adds: {
                $setUnion: [ { $map: { input: "$duplicates", as: "d", in: { $arrayElemAt: [ "$adds", "$$d" ] } } }, [] ]
            }
        }
    }
])

Prints:

{ "_id" : 1, "adds" : [ "111" ] }
{ "_id" : 2, "adds" : [ "555" ] }
{ "_id" : 3, "adds" : [ "888" ] }
Sign up to request clarification or add additional context in comments.

4 Comments

great great great. Awesome mickl
@mickl - It appears to be not working when you multiple duplicate values. ex "adds": ["111", "222", "333","333", "111"]
Kindly note that you don't need the $slice since $indexOfArray supports the startIndexparameter. Also, the two stages $addFields and $project can be merged into one.
Modified my answer, thank you @dnickless, that makes it a bit shorter. I'll still keep two stages since I believe it's more readable than nested map/filter
2

Here is another version that you might want to compare in terms of performance:

db.users.aggregate({
  $project:{
    "adds":{
      $reduce:{
        "input":{$range:[0,{$size:"$adds"}]}, // loop variable from 0 to max. index of $adds array
      //"input":{$range:[0,{$subtract:[{$size:"$adds"},1]}]}, // this would be enough but looks more complicated
        "initialValue":[],
        "in":{
            $let:{
              "vars":{
                "curr": { $arrayElemAt: [ "$adds", "$$this"] } // the element we're looking at
              },
              "in":{
                // if there is another identical element after the current one then we have a duplicate
                $cond:[
                  {$ne:[{$indexOfArray:["$adds","$$curr",{$add:["$$this",1]}]},-1]},
                  {$setUnion:["$$value",["$$curr"]]}, // combine duplicates found so far with new duplicate
                  "$$value" // continue with current value
                ]
              }
            }
        }
      }
    }
  }
})

The logic is based on a loop variable which we get through the $range operator. This loop variable allows for sequential access of the adds array. For every item that we look at, we check if there is another identical one after the current index. If yes, we have a duplicate, otherwise not.

Comments

1

You can try below aggregation. The idea is to collect the distinct values and iterate over values and check if the value is present in adds array; if present keep the value else ignore the value.

db.users.aggregate({
  "$project":{
    "adds":{
      "$reduce":{
        "input":{"$setUnion":["$adds",[]]},
        "initialValue":[],
        "in":{
          "$concatArrays":[
            "$$value",
            {"$let":{
              "vars":{
                "match":{
                  "$filter":{"input":"$adds","as":"a","cond":{"$eq":["$$a","$$this"]}}
                }},
                "in":{
                  "$cond":[{"$gt":[{"$size":"$$match"},1]},["$$this"],[]]
                }
            }}
          ]
        }
      }
    }
  }
})

1 Comment

This would probably be faster: db.users.aggregate({ "$project":{ "adds":{ "$reduce":{ "input":"$adds", "initialValue":[], "in":{ "$let":{ "vars":{ "match":{ "$filter":{"input":"$adds","as":"a","cond":{"$eq":["$$a","$$this"]}} }}, "in":{ "$cond":[{"$gt":[{"$size":"$$match"},1]},{"$setUnion":["$$value",["$$this"]]},"$$value"] } } } } } } })

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.