1

I have an index with documents of the following structure:

{
  "_id" : "1234567890abcdef",
  ...
  "entities" : [
    {  
      "name" : "beer",
      "evidence_start" : 12,
      "evidence_end" : 16
    },
    {  
      "name" : "water",
      "evidence_start" : 55,
      "evidence_end" : 60
    },
    {  
      "name" : "beer",
      "evidence_start" : 123,
      "evidence_end" : 127
    },
    ...
  ]
}

entities is an object of type nested here. I need to count how many documents contain mentions about beer. The issue is that an obvious bucket aggregation returns the amount of mentions, not documents, so that if beer is mentioned twice in the same document, it adds up 2 to the total result as well. A query I use to do that is:

{
  ...
  "aggs": {
      "entities": {
        "nested": {
          "path": "entities"
        },
        "aggs": {
          "entity_count": {
            "terms": {
              "field": "entities.name",
              "size" : 20
            }
          }
        }
      }
    },
  ...
}

Is there a way of counting only distinct mentions without scripting?

Many thanks in advance.

1 Answer 1

2

you simply need to a reverse nested aggregation as a sub aggregation, to count the number of "main documentd" instead of nested documents.

You should try

{
  ...
  "aggs": {
      "entities": {
        "nested": {
          "path": "entities"
        },
        "aggs": {
          "entity_count": {
            "terms": {
              "field": "entities.name",
              "size" : 20
            },
            "aggs": {
                "main_document_count": {
                    "reverse_nested": {}
                }
            }
          }
        }
      }
    },
  ...
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.