Elasticsearch return document ids while doing aggregate query

Question

Is it possible to get an array of elasticsearch document id while group by, i.e

Current output

"aggregations": {,
        "types": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "Text Document",
                    "doc_count": 3310
                },
                {
                    "key": "Unknown",
                    "doc_count": 15
                },
                {
                    "key": "Document",
                    "doc_count": 13
                }
            ]
        }
    }

Desired output

"aggregations": {,
        "types": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "Text Document",
                    "doc_count": 3310,
                    "ids":["doc1","doc2", "doc3"....]
                },
                {
                    "key": "Unknown",
                    "doc_count": 15,
                    "ids":["doc11","doc12", "doc13"....]
                },
                {
                    "key": "Document",  
                    "doc_count": 13
                    "ids":["doc21","doc22", "doc23"....]
                }
            ]
        }
    }

Not sure if this is possible in elasticsearch or not, below is my aggregation query:

{
    "size": 0,
    "aggs": {
        "types": {
            "terms": {
                "field": "docType",
                "size": 10
            }
        }
    }
}

Elasticsearch version: 6.3.2

jaspreet chahal · Accepted Answer · 2020-05-22 11:56:21Z

3

You can use top_hits aggregation which will return all documents under an aggregation. Using source filtering you can select fields under hits

Query:

  "aggs": {
    "district": {
      "terms": {
        "field": "docType",
        "size": 10
      },
      "aggs": {
        "docs": {
          "top_hits": {
            "size": 10,
            "_source": ["ids"]
          }
        }
      }
    }
  }

answered May 22, 2020 at 11:56

jaspreet chahal

9,1392 gold badges14 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Raghu Chahar Over a year ago

Thanks, it is working, is it possible to hide metadata of the document i.e "_index", "_type", "_id", "_score", "_source". I just need _id of doc not all other metadata

jaspreet chahal Over a year ago

@RaghuChahar unfortunately no, metadata cannot be removed

Pip · Accepted Answer · 2022-01-07 05:34:16Z

0

For anyone interested, another solution is to create a custom key value using a script to create a string of delineated values from the doc, including the id. It may not be pretty, but you can then parse it out later - and if you just need something minimal like the doc id, it may be worth it.

{
    "size": 0,
    "aggs": {
        "types": {
            "terms": {
                "script": "doc['docType'].value+'::'+doc['_id'].value",
                "size": 10
            }
        }
    }
}

answered Jan 7, 2022 at 5:34

Pip

2742 silver badges6 bronze badges

Comments

Tarzan1163 · Accepted Answer · 2025-02-05 20:12:39Z

I suppose that easiest way is to use scripted_metric aggregation, though it may seem a bit complicated at first:

"aggs": {
  "types": {
    "terms": {
      "field": "docType",
      "size": 10
    },
    "aggs": {
      "ids": {
        "scripted_metric": {
          "init_script": "state.ids = []",
          "map_script": "state.ids.add(doc['_id'].value)",
          "combine_script": "state",
          "reduce_script": "def result = []; for (state in states) result.addAll(state.ids); return result;"
        }
      }
    }
  }
}

This script should result in what you are looking for:

"aggregations" : {
  "types" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
      {
        "key": "Text Document",
        "doc_count": 3310,
        "ids":["doc1", "doc2", "doc3", ...]
      },
      {
        "key": "Unknown",
        "doc_count": 15,
        "ids":["doc11", "doc12", "doc13"...]
      },
      {
        "key": "Document",  
        "doc_count": 13
        "ids":["doc21", "doc22", "doc23"...]
      }
    ]
  }
}

Those four scripts are executed in following order:

1) init_script
You are provided with object state on which you can create any properties you want that you can use later.
This script is optional.
2) map_script
You have access to previously initialized state object and also to doc object, that references current document.
This script is executed for each document in current bucket but with the same state object so you need to introduce some logic that will collect result data.
3) combine_script
As those document may be spread across multiple shards (computers/processes), this script allows you to aggregate data collected from all documents on current shard before they are passed to aggregation across all shards.
In this case, we already aggregated ids in previous script into provided state object and so we can return that object right away but usually this step would be used when you want to calculate e.g. min or max value of some field and in such case, you only store values of those fields in previous script and do all the calculations here.
This script is executed after mapping on each document on current shard is done.
4) reduce_script
And finally, this script is executed after all shards returned theirs data and only job here is to combine those data in some way and return result.
You are provided with states object which contains results of previous script executions on all shards.

Hope this helps and that it is not too late to post it. I was struggling with similar task as well and it is interesting, that there is still no clear answer anywhere to it.

Link to official documentation is here if anyone want to learn a bit more about how it works.

Collectives™ on Stack Overflow

Elasticsearch return document ids while doing aggregate query

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related