Elasticsearch aggregation query

Question

I have a collection of documents stored in elasticsearch, they look like this:

{
  "id": "12312312",
  "timestamp": "2015-11-01T00:00:00.000",
  "unit": {
    "id": "123456",
    "name": "unit-4"
  },
  "samples": [
    {
      "value": 244.05435180062133,
      "aggregation": "M",
      "type": {
        "name": "SomeName1",
        "display": "Some name 1"
      }
    },
    {
      "value": 251.19450064653438,
      "aggregation": "I",
      "type": {
        "name": "SomeName2",
        "display": "Some name 2"
      }
    },
    ...
  ]
}

I would like to run an aggregation query against it which would return counts of unit.id per buckets for property samples.value, query should be based on samples.type.name and samples.aggregation. I've produced something like this:

{
  "query": {
    "bool": {
      "must": [{
        "range": {
          "timestamp": {
            "gte": "2015-11-01T00:00:00.000",
            "lte": "2015-11-30T23:59:59.999",
            "format": "date_hour_minute_second_fraction"
          }
        }
      }, {
        "nested": {
          "path": "samples",
          "query": {
            "bool": {
              "must": [{
                "match": {
                  "samples.type.name": "SomeName1"
                }
              }]
            }
          }
        }
      }]
    }
  },
  "aggs": {
    "0": {
      "nested": {
        "path": "samples"
      },
      "aggs": {
        "1": {
          "histogram": {
            "field": "samples.value",
            "interval": 10
          }
        }
      }
    }
  }
}

And I'm querying http://localhost:9200/dc/sample/_search?search_type=count&pretty . But this returns counts of nested documents in samples array. But I need to count distinct unit.id per bucket...

Can you guys help me please?

Edit: added mapping

{
  "dc" : {
    "mappings" : {
      "sample" : {
        "unit" : {
          "properties" : {
            "name" : {
              "type" : "string"
            }}},
        "samples" : {
          "type" : "nested",
          "properties" : {
            "aggregation" : {
              "type" : "string"
            },
            "type" : {
              "properties" : {
                "display" : {
                  "type" : "string"
                },
                "name" : {
                  "type" : "string"
                }
              }
            },
            "value" : {
              "type" : "double"
            }
          }
        },
        "timestamp" : {
          "type" : "date",
          "format" : "strict_date_optional_time||epoch_millis"
        }}}}}
}

Edit I'll try to rephrase it...I want to get count of units per bucket defined by "histogram_samples_value". That means sum of this counts should be total number of units. And to test it I wrote a query which filters only one unit (many documents with different sample values) - all but one "histogram_samples_value" buckets should contain count=0 and one bucket should contain count = 1 .

Mapping added, if necessary I can probably even change document structure - I'm expecting to have up to 100 mil. documents in that index. — viliam
– viliam, Commented Dec 16, 2015 at 17:43
This seems better. Maybe I'm misunderstanding your question, but why do you use histogram aggregation? Your requirements do not seem to need it at all. Also, could you add minimum expected output? — Evaldas Buinauskas
– Evaldas Buinauskas, Commented Dec 17, 2015 at 9:08

Sloan Ahrens · Accepted Answer · 2015-12-16 17:50:12Z

1

I think you can get what you want with the reverse nested aggregation, like this:

POST /test_index/_search
{
   "size": 0,
   "aggs": {
      "nested_samples": {
         "nested": {
            "path": "samples"
         },
         "aggs": {
            "histogram_samples_value": {
               "histogram": {
                  "field": "samples.value",
                  "interval": 10
               },
               "aggs": {
                  "reverse_nested_doc": {
                     "reverse_nested": {},
                     "aggs": {
                        "terms_unit_id": {
                           "terms": {
                              "field": "unit.id"
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

Here is some code I used to test it:

http://sense.qbox.io/gist/e93dbddbbc4a841af5d9ce687a543a2914457d31

answered Dec 16, 2015 at 17:50

Sloan Ahrens

8,7382 gold badges32 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

viliam Over a year ago

hey, thanks for sample, but this query returns histogram buckets filled with buckets based on "unit.id" term and they are the same for every "histogram" bucket. It's not precisely what I wanted. I'll try to rephrase it...I want to get count of units per bucket defined by "histogram_samples_value". That means sum of this counts should be total number of units. And to test it I wrote a query which filters only one sample - all but one "histogram_samples_value" buckets should contain count=0 and one bucket should contain count = 1

viliam Over a year ago

I probably have to use filter aggregation with cardinality aggregation somehow.

Collectives™ on Stack Overflow

Elasticsearch aggregation query

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related