1

I have two collections that are in many-to-one relationship (multiple hosts' http services often serve the 'same' e.g. DNS-level load balancing). I'm trying to build a query returning the relevant documents (from two collections) merged into one.

hosts collection:

{
    "_id" : ObjectId("60aa2485332483cb4f5e7122"),
    "ip" : "1.2.3.4",
    "services" : [
        {
            "proto" : "tcp",
            "port" : "22",
            "status" : "open",
            "reason" : "syn-ack",
            "ttl" : 53,
        },
        {
            "proto" : "tcp",
            "port" : "80",
            "status" : "open",
            "reason" : "syn-ack",
            "ttl" : 51,
            "http" : [
                ObjectId("60aa64c67d0bf23ce47c530c")
            ]
        }
    ],
    "version" : 4,
    "last_scanned" : 1621573240.730579,

https collection:

{
    "_id" : ObjectId("60aa64c67d0bf23ce47c530c"),
    "vhost" : "test.com",
    "paths" : [
        {
            "path" : "/admin",
            "code" : 200
        },
        {
            "path" : "/stuff",
            "code" : 200
        }
    ]
}

I'd like to write a lookup where the output is a combination of these two collections. So far I was able to get the https document into a top-level array in hosts:

db.hosts.aggregate([                                                                                                                                       
  {                                                                             
    $lookup:                                                                    
        {                                                                       
            from: "https",                                                      
            localField: "services.http",                                        
            foreignField: "_id",                                                
            as: 'http'                                                 
        }                                                                       
  }                                                                             
]).pretty()

Which ends up as:

{
    "_id" : ObjectId("60aa2485332483cb4f5e7122"),
    "ip" : "1.2.3.4",
    "services" : [
        {
            "proto" : "tcp",
            "port" : "22",
            "status" : "open",
            "reason" : "syn-ack",
            "ttl" : 53,
        },
        {
            "proto" : "tcp",
            "port" : "80",
            "status" : "open",
            "reason" : "syn-ack",
            "ttl" : 51,
            "http" : [
                ObjectId("60aa64c67d0bf23ce47c530c")
            ]
        }
    ],
    "http" : [
        {
            "_id" : ObjectId("60aa64c67d0bf23ce47c530c"),
            "vhost" : "test.com",
            "paths" : [
                {
                    "path" : "/admin",
                    "code" : 200
                },
                {
                    "path" : "/stuff",
                    "code" : 200
                }
            ]
        }
    ]
    "version" : 4,
    "last_scanned" : 1621573240.730579
    ]
}

The problem is that I can't move the "http" field to the place where it's ObjectId was found by lookup (services.$.http). I was trying to modify the 'as' field of $lookup in various ways without success.

Is it even possible to point to lower levels of a nested document with 'as'? Any workaround to achieve this?

1 Answer 1

2
  • $unwind deconstruct services array
  • $lookup with https and set as as services.http
  • $group by _id and reconstruct services array and set other required fields
db.hosts.aggregate([
  { $unwind: "$services" },
  {
    $lookup: {
      from: "https",
      localField: "services.http",
      foreignField: "_id",
      as: "services.http"
    }
  },
  {
    $group: {
      _id: "$_id",
      ip: { $first: "$ip" },
      services: { $push: "$services" },
      version: { $first: "$version" },
      last_scanned: { $first: "$last_scanned" }
    }
  }
]).pretty()

Playground


Second option without $unwind,

  • $lookup with https collection
  • $map to iterate loop of services array
  • $filter to iterate loop of http result that is from lookup
  • $ifNull will return empty [] if field is null / not found
  • $mergeObjects to merge current object of services and filtered http array
  • http array result is not needed now so remove it using $$REMOVE
db.hosts.aggregate([
  {
    $lookup: {
      from: "https",
      localField: "services.http",
      foreignField: "_id",
      as: "http"
    }
  },
  {
    $addFields: {
      services: {
        $map: {
          input: "$services",
          as: "s",
          in: {
            $mergeObjects: [
              "$$s",
              {
                http: {
                  $filter: {
                    input: "$http",
                    cond: {
                      $in: ["$$this._id", { $ifNull: ["$$s.http", []] }]
                    }
                  }
                }
              }
            ]
          }
        }
      },
      http: "$$REMOVE"
    }
  }
])

Playground

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, this seems to do what I want. I was on the right track, just did not use group properly. Going back to experiment further...
The only downside I see is that each field in 'hosts' needs to be added to $group, so the document's flexibility depends on the aggregation pipeline. It's not a big problem atm, but is there any way to avoid that?
not other option using when we use $unwind & $gruoup, but there is an option without $unwind, it will cause performance issues when there are lots of data.
see i have added second option without $unwind.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.