0

please, help I need to do $lookup/join for field pid in 2 level nested array where coll2.string(pid)==coll1.a.p.pid , collection coll1 is pretty big and there is index on "a.p.pid" but not sure how to do lookup on nested array elements and by chance to avoid $unwind and use the index "a.p.pid" ...

coll1:

[
  {
  a:[
     {
      p:[
         {pid:"1" ,date:"2022-01-22"},
         {pid:"4", date:"2022-01-25"}
       ]
     }
   ,{
     p:[
        {pid:"3",date:"2022-01-27"}
      ]
    }
  ]
 }
]

coll2:

 [ 
  {
   pid:1
  }
  ,
 {
  pid:3
  }
 ]

Expected result:

 [
  {pid:1, date:"2022-01-22"}
  {pid:3, date:"2022-01-27"}
 ]

collections sizes:

db.col2.count()
51171548
db.col1.count()
81039496

mongodb version 4.0

2 Answers 2

1

Not sure why do you have to avoid using $unwind

db.col2.aggregate([
  {
    $set: { pid: { $toString: "$pid" } }
  },
  {
    $lookup: {
      from: "col1",
      localField: "pid",
      foreignField: "a.p.pid",
      as: "date",
      let: { pid: "$pid" },
      pipeline: [
        {
          $unwind: "$a"
        },
        {
          $unwind: "$a.p"
        },
        {
          $match: { $expr: { $eq: [ "$a.p.pid", "$$pid" ] } }
        }
      ]
    }
  },
  {
    $set: { date: { $first: "$date.a.p.date" }, pid: { $toInt: "$pid" } }
  }
])

mongoplayground


db.col2.aggregate([
  {
    $set: { pid: { $toString: "$pid" } }
  },
  {
    $lookup: {
      from: "col1",
      localField: "pid",
      foreignField: "a.p.pid",
      as: "date"
    }
  },
  {
    $set: {
      date: {
        $filter: {
          input: "$date",
          as: "d1",
          cond: {
            $gt: [
              {
                $size: {
                  $filter: {
                    input: "$$d1.a",
                    as: "d2",
                    cond: {
                      $gt: [
                        {
                          $size: {
                            $filter: {
                              input: "$$d2.p",
                              as: "d3",
                              cond: { $eq: [ "$$d3.pid", "$pid" ] }
                            }
                          }
                        },
                        0
                      ]
                    }
                  }
                }
              },
              0
            ]
          }
        }
      }
    }
  },
  {
    $set: { date: { $first: "$date.a" } }
  },
  {
    $set: {
      "date": {
        $filter: {
          input: "$date",
          as: "d2",
          cond: {
            $gt: [
              {
                $size: {
                  $filter: {
                    input: "$$d2.p",
                    as: "d3",
                    cond: { $eq: [ "$$d3.pid", "$pid" ] }
                  }
                }
              },
              0
            ]
          }
        }
      }
    }
  },
  {
    $set: { date: { $first: "$date.p" } }
  },
  {
    $set: {
      date: {
        $filter: {
          input: "$date",
          as: "d3",
          cond: { $eq: [ "$$d3.pid", "$pid" ] }
        }
      }
    }
  },
  {
    $set: {
      date: { $first: "$date.date" },
      pid: { $toInt: "$pid" }
    }
  }
])

mongoplayground

Sign up to request clarification or add additional context in comments.

11 Comments

collection is relatively big and initial 2x unwinds on col1 always take alot of time , but your solution provide me the help I needed , thanks alot!
any option to avoid the 2x unwinds or some initial filtering for col1 based on the index "$a.p.pid" is highly appreciated
@R2D2 Since we have localField: "pid", foreignField: "a.p.pid" , I don't think we need to add $match: { $expr: { $eq: [ "$a.p.pid", "$$pid" ] } } in the beginning of the lookup pipeline. It is the same thing right ?
@R2D2 I see. That may be a big problem. I would try another method.
@R2D2 I update my answer by replace $unwind/$unwind/$match with $map/$filter. Not sure it is fast enough. Try it and tell me.
|
0

Here is the final solution implemented in the real case confirmed to work correctly in v.4.0:

 db.col2.aggregate([
 {
 $project: {
  pid: {
    $toString: "$pid"
  }
 }
 },
  {
   $lookup: {
    from: "col1",
    localField: "pid",
    foreignField: "a.p.pid",
    as: "date"
  }
 },
 {
  $project: {
  "date.a.p.date": 1,
  "date.a.p.pid": 1,
  _id: 1,
  pid: 1
 }
 },
 {
  $unwind: "$date"
 },
 {
   $unwind: "$date.a"
 },
 {
   $unwind: "$date.a.p"
 },
 {
 $project: {
   pid: "$date.a.p.pid",
   date: "$date.a.p.date",
   "matched": {
    "$eq": [
      "$pid",
      "$date.a.p.pid"
    ]
  }
 }
 },
  {
   $match: {
     matched: true
  }
 },
  {
   $project: {
    "_id": 0,
    pid: 1,
   date: 1
  }
 }
 ])

@YuTing: Your answer marked as accepted since it seems better for higher versions! Tanks to @YuTing for the help!

playground

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.