Elasticsearch - getting aggregated data based on unique values from field

Question

In my elasticsearch (7.13) index, I have the following dataset:

maid      site_id    date         hour
m1        1300       2021-06-03   1
m1        1300       2021-06-03   2
m1        1300       2021-06-03   1
m2        1300       2021-06-03   1

I am trying to get unique count of records for each date and site_id from the above table. The desired result is

maid      site_id   date        count        
m1        1300      2021-06-03  1
m2        1300      2021-06-03  1

I have millions of maid for each site_id and the dates spans across two years. I am using the following code with cardinality on maid assuming that it will return the unique maid's.

GET /r_2332/_search
{
  "size":0,
  "aggs": {
    "site_id": {
      "terms": {
        "field": "site_id",
        "size":100,
        "include": [
          1171, 1048
        ]
      },"aggs" : {
          "bydate" : {
            "range" : {
              "field": "date","ranges" : [
                {
                  "from": "2021-04-08",
                  "to": "2021-04-22" 
                }
                ]
            },"aggs" : {
              "rdate" : {
                "terms" : {
                  "field":"date" 
                },"aggs" :{
                  "maids" : {
                    "cardinality": {
                      "field": "maid"
                    }
                  }
              } 
            } 
          } 
        }
      }
    }
  }
}

This still returns the data with all the duplicate values. How do I include maid field into my query where I get the data filtered on unique maid values.

Bhavya · Accepted Answer · 2021-06-12 03:19:07Z

You can use multi terms aggregation along with cardinality aggregation if you want to get unique documents based on site_id and maid

    {
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "site_id": [
              "1300",
              "1301"
            ]
          }
        },
        {
          "range": {
            "date": {
              "gte": "2021-06-02",
              "lte": "2021-06-03"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by": {
      "multi_terms": {
        "terms": [
          {
            "field": "site_id"
          },
          {
            "field": "maid.keyword"
          }
        ]
      },
      "aggs": {
        "type_count": {
          "cardinality": {
            "field": "site_id"
          }
        }
      }
    }
  }
}

Search Result will be

"aggregations": {
    "group_by": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": [
            1300,
            "m1"
          ],
          "key_as_string": "1300|m1",
          "doc_count": 3,
          "type_count": {
            "value": 1           // note this
          }
        },
        {
          "key": [
            1300,
            "m2"
          ],
          "key_as_string": "1300|m2",
          "doc_count": 1,
          "type_count": {
            "value": 1            // note this
          }
        }
      ]
    }

Collectives™ on Stack Overflow

Elasticsearch - getting aggregated data based on unique values from field

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related