0

In my elasticsearch (7.13) index, I have the following dataset:

maid      site_id    date         hour
m1        1300       2021-06-03   1
m1        1300       2021-06-03   2
m1        1300       2021-06-03   1
m2        1300       2021-06-03   1

I am trying to get unique count of records for each date and site_id from the above table. The desired result is

maid      site_id   date        count        
m1        1300      2021-06-03  1
m2        1300      2021-06-03  1

I have millions of maid for each site_id and the dates spans across two years. I am using the following code with cardinality on maid assuming that it will return the unique maid's.

GET /r_2332/_search
{
  "size":0,
  "aggs": {
    "site_id": {
      "terms": {
        "field": "site_id",
        "size":100,
        "include": [
          1171, 1048
        ]
      },"aggs" : {
          "bydate" : {
            "range" : {
              "field": "date","ranges" : [
                {
                  "from": "2021-04-08",
                  "to": "2021-04-22" 
                }
                ]
            },"aggs" : {
              "rdate" : {
                "terms" : {
                  "field":"date" 
                },"aggs" :{
                  "maids" : {
                    "cardinality": {
                      "field": "maid"
                    }
                  }
              } 
            } 
          } 
        }
      }
    }
  }
}

This still returns the data with all the duplicate values. How do I include maid field into my query where I get the data filtered on unique maid values.

1 Answer 1

1

You can use multi terms aggregation along with cardinality aggregation if you want to get unique documents based on site_id and maid

    {
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "site_id": [
              "1300",
              "1301"
            ]
          }
        },
        {
          "range": {
            "date": {
              "gte": "2021-06-02",
              "lte": "2021-06-03"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by": {
      "multi_terms": {
        "terms": [
          {
            "field": "site_id"
          },
          {
            "field": "maid.keyword"
          }
        ]
      },
      "aggs": {
        "type_count": {
          "cardinality": {
            "field": "site_id"
          }
        }
      }
    }
  }
}

Search Result will be

"aggregations": {
    "group_by": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": [
            1300,
            "m1"
          ],
          "key_as_string": "1300|m1",
          "doc_count": 3,
          "type_count": {
            "value": 1           // note this
          }
        },
        {
          "key": [
            1300,
            "m2"
          ],
          "key_as_string": "1300|m2",
          "doc_count": 1,
          "type_count": {
            "value": 1            // note this
          }
        }
      ]
    }
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.