0

I have some documents which contain a nested field planList which is an array of objects. It's length will always be 2 or 3. And the objects always have 3 keys - planName, planType, and planId.

I want to write a query which will return a list of all planNames, along with the most frequent plan that it was present with and it's frequency.

For example, consider the following 4 documents -

{planList: [{planName: a, planType: x, planId: 1},{planName: b, planType: x, planId: 2}]}
{planList: [{planName: a, planType: x, planId: 1},{planName: b, planType: x, planId: 2},{planName: c, planType: y, planId: 3}]}
{planList: [{planName: a, planType: x, planId: 1},{planName: c, planType: y, planId: 3}]}
{planList: [{planName: d, planType: y, planId: 4},{planName: c, planType: y, planId: 3}]}

The response for this data should have the following insights -

plan A was found with plan B and plan C 2 times (draw between plan B and C)
plan B was found with plan A 2 times
plan C was found with plan A 2 times
plan D was found with plan C one time

The query to get to these docs is very simple -

"query": {
    "bool": {
      "must": [
        {
          "match": {
            "event": "comparePlans"
          }
        }
      ]
    }
  }

Can someone please help me with the aggregations needed to get the insight I'm looking for? Thanks.

Edit: The mapping for the above index looks like this -

"planList": {
   "type": "nested",
   "properties": {
     "planId": {
       "type": "keyword"
     },
     "planName": {
       "type": "keyword"
     },
     "planType": {
       "type": "keyword"
     }
   }
 },
"event": {
   "type": "keyword",
   "null_value": "none"
 }
2
  • Can you please provide index mapping as well? Commented Jan 24, 2023 at 11:26
  • @SagarPatel I've updated the question with the index mapping. Commented Jan 25, 2023 at 7:57

1 Answer 1

1

As far as I'm aware, there is no perfect way to do it outside of a scripted aggregation, however, this is close. Pay attention that this aggregation also counts the appearances of a plan name with itself.

If you know that the same plan cannot appear twice in the array, you can just get this answer and then filter out the same plan name from each bucket.

Query:

{
  "size": 0,
  "aggs": {
    "ALL_PLANS": {
      "nested": {
        "path": "planList"
      },
      "aggs": {
        "GROUPED_BY_NAME": {
          "terms": {
            "field": "planList.planName",
            "size": 10
          },
          "aggs": {
            "REVERSE_TO_ROOT": {
              "reverse_nested": {},
              "aggs": {
                "ALL_PLANS": {
                  "nested": {
                    "path": "planList"
                  },
                  "aggs": {
                    "GROUPED_BY_NAME": {
                      "terms": {
                        "field": "planList.planName",
                        "size": 10
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Response:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ALL_PLANS" : {
      "doc_count" : 9,
      "GROUPED_BY_NAME" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "a",
            "doc_count" : 3,
            "REVERSE_TO_ROOT" : {
              "doc_count" : 3,
              "ALL_PLANS" : {
                "doc_count" : 7,
                "GROUPED_BY_NAME" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : "a",
                      "doc_count" : 3
                    },
                    {
                      "key" : "b",
                      "doc_count" : 2
                    },
                    {
                      "key" : "c",
                      "doc_count" : 2
                    }
                  ]
                }
              }
            }
          },
          {
            "key" : "c",
            "doc_count" : 3,
            "REVERSE_TO_ROOT" : {
              "doc_count" : 3,
              "ALL_PLANS" : {
                "doc_count" : 7,
                "GROUPED_BY_NAME" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : "c",
                      "doc_count" : 3
                    },
                    {
                      "key" : "a",
                      "doc_count" : 2
                    },
                    {
                      "key" : "b",
                      "doc_count" : 1
                    },
                    {
                      "key" : "d",
                      "doc_count" : 1
                    }
                  ]
                }
              }
            }
          },
          {
            "key" : "b",
            "doc_count" : 2,
            "REVERSE_TO_ROOT" : {
              "doc_count" : 2,
              "ALL_PLANS" : {
                "doc_count" : 5,
                "GROUPED_BY_NAME" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : "a",
                      "doc_count" : 2
                    },
                    {
                      "key" : "b",
                      "doc_count" : 2
                    },
                    {
                      "key" : "c",
                      "doc_count" : 1
                    }
                  ]
                }
              }
            }
          },
          {
            "key" : "d",
            "doc_count" : 1,
            "REVERSE_TO_ROOT" : {
              "doc_count" : 1,
              "ALL_PLANS" : {
                "doc_count" : 2,
                "GROUPED_BY_NAME" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : "c",
                      "doc_count" : 1
                    },
                    {
                      "key" : "d",
                      "doc_count" : 1
                    }
                  ]
                }
              }
            }
          }
        ]
      }
    }
  }
}

So we can see that "a" appears with "a" 3 times (ignore this), and with "b" and "c" 2 times.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.