2

I have the following set of nested subaggregations in elasticsearch (field2 is a subaggregation of field1 and field3 is a subaggregation of field2). It turns out however that the terms aggregation for field3 will not bucket documents that dont have field3.

My understanding is that I have to use a Missing subaggregation query to bucket those in addition to the term query for field3.

But I am not sure how can I add it to the query below to bucket both.

{
  "size": 0,
  "aggregations": {
    "f1": {
      "terms": {
        "field": "field1",
        "size": 0,
        "order": {
          "_count": "asc"
        },
        "include": [
          "123"
        ]
      },
      "aggregations": {
        "field2": {
          "terms": {
            "field": "f2",
            "size": 0,
            "order": {
              "_count": "asc"
            },
            "include": [
              "tr"
            ]
          },
          "aggregations": {
            "field3": {
              "terms": {
                "field": "f3",
                "order": {
                  "_count": "asc"
                },
                "size": 0
              },
              "aggregations": {
                "aggTopHits": {
                  "top_hits": {
                    "size": 1
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
1

1 Answer 1

6

In version 2.1.2 and later, you can use the missing parameter of the terms aggregation, which allows you to specify a default value for documents that are missing that field. (FYI, the missing parameter was available starting 2.0, but there was a bug which prevented it from working on sub-aggregations, which is how you would use it here.)

     ...
     "aggregations": {
        "field3": {
          "terms": {
            "field": "f3",
            "order": {
              "_count": "asc"
            },
            "size": 0,
            "missing": "n/a"     <----- provide a default here
          },
          "aggregations": {
            "aggTopHits": {
              "top_hits": {
                "size": 1
              }
            }
          }
        }
      }

However, if you are working with a pre-2.x ES cluster, you can use the missing aggregation at the same depth as your field3 aggregation to bucket the documents that are missing "f3" like this:

     ...
     "aggregations": {
        "field3": {
          "terms": {
            "field": "f3",
            "order": {
              "_count": "asc"
            },
            "size": 0
          },
          "aggregations": {
            "aggTopHits": {
              "top_hits": {
                "size": 1
              }
            }
          }
        },
        "missing_field3": {
          "missing" : {
            "field": "f3"
          },
          "aggregations": {
            "aggTopMissingHit": {
              "top_hits": {
                "size": 1
              }
            }
          }
        }
      }
Sign up to request clarification or add additional context in comments.

2 Comments

I am on version 2.1.1 but unfortunately the newer way doesnt work on an inner level see this issue github.com/elastic/elasticsearch/issues/14882
Good call. I was testing on a 2.2 cluster and it worked as expected. I was not aware of the bug in 2.1.1 and prior. I have updated my answer to clarify in which version the missing option works and included the link you provided to the bug details.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.