Count nested objects no more than once in each document in Elasticsearch

Question

I have an index with documents of the following structure:

{
  "_id" : "1234567890abcdef",
  ...
  "entities" : [
    {  
      "name" : "beer",
      "evidence_start" : 12,
      "evidence_end" : 16
    },
    {  
      "name" : "water",
      "evidence_start" : 55,
      "evidence_end" : 60
    },
    {  
      "name" : "beer",
      "evidence_start" : 123,
      "evidence_end" : 127
    },
    ...
  ]
}

entities is an object of type nested here. I need to count how many documents contain mentions about beer. The issue is that an obvious bucket aggregation returns the amount of mentions, not documents, so that if beer is mentioned twice in the same document, it adds up 2 to the total result as well. A query I use to do that is:

{
  ...
  "aggs": {
      "entities": {
        "nested": {
          "path": "entities"
        },
        "aggs": {
          "entity_count": {
            "terms": {
              "field": "entities.name",
              "size" : 20
            }
          }
        }
      }
    },
  ...
}

Is there a way of counting only distinct mentions without scripting?

Many thanks in advance.

Pierre Mallet · Accepted Answer · 2019-09-02 11:39:58Z

2

you simply need to a reverse nested aggregation as a sub aggregation, to count the number of "main documentd" instead of nested documents.

You should try

{
  ...
  "aggs": {
      "entities": {
        "nested": {
          "path": "entities"
        },
        "aggs": {
          "entity_count": {
            "terms": {
              "field": "entities.name",
              "size" : 20
            },
            "aggs": {
                "main_document_count": {
                    "reverse_nested": {}
                }
            }
          }
        }
      }
    },
  ...
}

answered Sep 2, 2019 at 11:39

Pierre Mallet

7,2412 gold badges22 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Count nested objects no more than once in each document in Elasticsearch

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related