I have a collection which contains documents with multiple arrays. These are generally quite large, but for purposes of explaining you can consider the following two documents:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" },
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
]
},
{
"obj1": [
{ "a": "c", "b": "b" }
],
"obj2": [
{ "a": "c", "b": "c" }
]
}
The idea is to just get the matching elements in the array to the query. There are multiple matches required and within multiple arrays so this is not within the scope of what can be done with projection and the positional $ operator. The desired result would be like:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "b" },
]
},
A traditional approach would be something like this:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$unwind": "$obj1" },
{ "$match": {
"obj1.a": "a",
"obj1.b": "b"
}},
{ "$unwind": "$obj2" },
{ "$match": { "obj2.b": "b" }},
{ "$group": {
"_id": "$_id",
"obj1": { "$addToSet": "$obj1" },
"obj2": { "$addToSet": "$obj2" }
}}
])
But the use of $unwind there for both arrays causes the overall set to use a lot of memory and slows things down. There are also possible problems there with $addToSet and splitting the $group stages for each array can make things even slower.
So I am looking for a process that is not so intensive but arrives at the same result.