1

I have two collections. (updated)

I need to join them and return only a few fields. As I understand, I should use aggregate and projection features. I try to find examples, but can't find the right, in my case foreign key is placed in the array in the nested field.

It is not easy for me, I’m new at this, and I couldn’t win against this MongoDB query.

In the code block below I wrote the simplified equivalent models (doc1, doc2) and expected results.

Does anyone have any ideas?

db.doc1.deleteMany({});
db.doc2.deleteMany({});

// tow related documents

db.doc1.insert( [
    { 
    "version" : 123456,
    "doc" : {
      "code":"A1",
      "name":"some document A1",
      "doc2CodeArray":[
          {"code":"B01"},
          {"code":"B02"},
          {"code":"B03"},
          {"code":"B04"},
          {"code":"B05"},
          {"code":"B06"}
          ]
      }
    },
    { 
    "version" : 123457,
    "doc" : {
      "code":"A2",
      "name":"some document A2",
      "doc2CodeArray":[
          {"code":"B07"},
          {"code":"B08"},
          {"code":"B09"},
          {"code":"B10"},
          {"code":"B11"},
          {"code":"B12"}
          ]
      }
    },
    { 
    "version" : 123457,
    "doc" : {
      "code":"A2",
      "name":"some document A2",
      "doc2CodeArray":null
      }
    }   

]);

db.doc2.insert( [
    {
        "version" : 567890,
        "doc" : { 
          "code":"B01", 
          "valueArray":[{"valueType":"int","valueData":"1"}],
          "doc2Type":{"code":"C1"}
        } 
    },
    {"version" : 567890,"doc" : { "code":"B02", "valueArray":[{"valueType":"int","valueData":"2","isDefault":false}],"doc2Type":{"code":"C2","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B03", "valueArray":[{"valueType":"int","valueData":"3","isDefault":false}],"doc2Type":{"code":"C3","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B04", "valueArray":[{"valueType":"int","valueData":"4","isDefault":false}],"doc2Type":{"code":"C4","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B05", "valueArray":[{"valueType":"int","valueData":"5","isDefault":false}],"doc2Type":{"code":"C5","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B06", "valueArray":[{"valueType":"int","valueData":"6","isDefault":false},
                                                              {"valueType":"str","valueData":"F","isDefault":false}],"doc2Type":{"code":"C6","addInfo":"1234567890"}}},
    
    {"version" : 567890,"doc" : { "code":"B07", "valueArray":[{"valueType":"int","valueData":"1","isDefault":false}],"doc2Type":{"code":"C1","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B08", "valueArray":[{"valueType":"int","valueData":"2","isDefault":false}],"doc2Type":{"code":"C2","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B09", "valueArray":[{"valueType":"int","valueData":"3","isDefault":false}],"doc2Type":{"code":"C3","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B10", "valueArray":[{"valueType":"int","valueData":"4","isDefault":false}],"doc2Type":{"code":"C4","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B11", "valueArray":[{"valueType":"int","valueData":"5","isDefault":false}],"doc2Type":{"code":"C5","addInfo":"1234567890"}}},
    {"version" : 567890,"doc" : { "code":"B12", "valueArray":[{"valueType":"int","valueData":"6","isDefault":false}],"doc2Type":{"code":"C6","addInfo":"1234567890"}}},    
    ]);

The result that I want

  I need join collections and select only few props
  {
      "doc":{code:"A1"},
      "doc2Items":[
        {"code":"C2", "value":{"value":"3"}},
        {"code":"C3", "value":{"value":"4"}},
      ]
  },
  {
      "doc":{code:"A2"},
      "doc2Items":[
        {"code":"C2", "value":{"value":"2"}},
        {"code":"C3", "value":{"value":"3"}},
        {"code":"C4", "value":{"value":"4"}},
      ]
  }

3 Answers 3

3

https://mongoplayground.net/p/1-1SU8SgbTQ

db.doc1.aggregate([
  {
    $lookup: {
      from: "doc2",
      localField: "doc.doc2CodeArray.code",
      foreignField: "doc.code",
      as: "doc.doc2Items"
    }
  },
  {
    $replaceRoot: {
      newRoot: "$doc"
    }
  },
  {
    $addFields: {
      "doc2Items": {
        $map: {
          input: "$doc2Items",
          in: {
            doc2TypeCode: "$$this.doc.doc2Type.code",
            doc2ValueArray: "$$this.doc.valueArray"
          }
        }
      }
    }
  },
  {
    $unset: [
      "doc2CodeArray"
    ]
  }
])
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks a lot! Due to old mongoDb version (3.6.4) $unset stage is not supprted. My bad. But, I tried to modification qeury, and I had't completely understand how but it hase been worked.
db.doc1.aggregate([ {$lookup: {from: "doc2",localField: "doc.doc2CodeArray.code", foreignField: "doc.code",as: "doc.doc2Items"}}, {$project: {_id: 0,"doc.code": 1,"doc.name": 1,"doc2Items": { $map: {input: "$doc.doc2Items",in: { doc2TypeCode: "$$this.doc.doc2Type.code", doc2ValueArray: "$$this.doc.valueArray"}}},}}])
It works pretty fast. For one document: 98...105ms For all (about 500): 2.7...3.5sec For one doc it's a little bit faster then like I read now (about 150ms), but for all documents (about 500) it's a little bit slower (about 2500ms)
additional question: if I foget about decoration, is it possible to get result faster? what if I to filter by doc2.doc2Items.code field with in "$lookup" ? In really I need only four doc2Items items (instead of 50 item in real data), for my example: doc2Type.code in (C2, C3, C4) only. Is it possibly?
To speed up the $lookup you need and index on doc2.doc.code
2
  1. $lookup - Join doc1 with doc2.

    Pipeline:

    1.1. $match - Matching doc.code (from doc2) is within the doc.doc2CodeArray.code array (from doc1).

    1.2. $project - Decorate output document to be returned in doc2Items.

  2. $project - Decorate output document.

db.doc1.aggregate([
  {
    $lookup: {
      from: "doc2",
      let: {
        doc2CodeArray: "$doc.doc2CodeArray.code"
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $in: [
                "$doc.code",
                "$$doc2CodeArray"
              ]
            }
          }
        },
        {
          $project: {
            _id: 0,
            "doc2TypeCode": "$doc.doc2Type.code",
            "doc2ValueArray": "$doc.valueArray"
          }
        }
      ],
      as: "doc2Items"
    }
  },
  {
    $project: {
      _id: 0,
      "doc1Code": "$doc.code",
      "doc1Name": "$doc.name",
      "doc2Items": "$doc2Items"
    }
  }
])

Sample Mongo Playground

3 Comments

Thanks a lot! I translate to my real data model, and the your solution worked the first time! But, I discovered that in the few docs the doc1.doc2CodeArray field is null, and query return error. And it worked too long time.
To compare, for now I get data in two steps: 1) get doc1.doc2CodeArray 2) get doc2.valueArray + doc2.doc2Type.code In the dev server that costs about 150ms for one document and 2500ms for all (about 500 docs). When I use this query, needs 2 seconds for one document and more than 1020 sec for all. It's too long. I should had to write the more relevant data example. And, I guss, the reason can be in old mongo version - 3.6.4.
Perhaps you can check with $ifNull and assign the empty when null as this demo.
0

Last few days I deep dived to reading. And I have got an accepted result.

I updated my question with example of more relevant data and I has given the my solution in the example below.

db.doc1.aggregate([
    {
        $match: {
            // filter by doc.code prefix
            "doc.code": { '$regex': 'A', '$options': 'i' }, "doc.doc2CodeArray": { $ne: null }
            //"doc.code": "A1"
        }
    },
    {
        $lookup: { // join doc1 & doc2
            from: "doc2",
            localField: "doc.doc2CodeArray.code",
            foreignField: "doc.code",
            as: "doc2Items"
        }
    },
    {
        $project: {
            _id: 0,
            "doc.code": 1,
            "doc2Items": { // items from doc2
                $map: {  // I need not all props from doc2 - do $map
                    "input": {
                        $filter: { // I need items only with 3 codes C2, C3, C4 - do $filter
                            "input": "$doc2Items",
                            "as": "d0",
                            "cond": { $or:[
                                { "$eq": ["$$d0.doc.doc2Type.code", "C2"] },
                                { "$eq": ["$$d0.doc.doc2Type.code", "C3"] },
                                { "$eq": ["$$d0.doc.doc2Type.code", "C4"] },
                            ]}
                        },
                    },
                    "as": "d",
                    "in": {
                        "code": "$$d.doc.doc2Type.code",
                        "value": {
                            $arrayElemAt: [ // I need only first record of nested array
                                {
                                    $map: { // I need not all props of first record of nested array
                                        "input": "$$d.doc.valueArray",
                                        "as": "d2",
                                        "in": {
                                            "value": "$$d2.valueData"
                                        }
                                    }
                                }
                                , 0]
                        },
                    }
                }// map
            }
        }// project
    }
])

Now I read the needed data two time faster then before. I think I began to understand mongo db query philosophy.

Thank everyone for the answers and tips! 'Yong Shun' and 'qtxo' helped me find the correct solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.