0

I have a collection of documents stored in elasticsearch, they look like this:

{
  "id": "12312312",
  "timestamp": "2015-11-01T00:00:00.000",
  "unit": {
    "id": "123456",
    "name": "unit-4"
  },
  "samples": [
    {
      "value": 244.05435180062133,
      "aggregation": "M",
      "type": {
        "name": "SomeName1",
        "display": "Some name 1"
      }
    },
    {
      "value": 251.19450064653438,
      "aggregation": "I",
      "type": {
        "name": "SomeName2",
        "display": "Some name 2"
      }
    },
    ...
  ]
}

I would like to run an aggregation query against it which would return counts of unit.id per buckets for property samples.value, query should be based on samples.type.name and samples.aggregation. I've produced something like this:

{
  "query": {
    "bool": {
      "must": [{
        "range": {
          "timestamp": {
            "gte": "2015-11-01T00:00:00.000",
            "lte": "2015-11-30T23:59:59.999",
            "format": "date_hour_minute_second_fraction"
          }
        }
      }, {
        "nested": {
          "path": "samples",
          "query": {
            "bool": {
              "must": [{
                "match": {
                  "samples.type.name": "SomeName1"
                }
              }]
            }
          }
        }
      }]
    }
  },
  "aggs": {
    "0": {
      "nested": {
        "path": "samples"
      },
      "aggs": {
        "1": {
          "histogram": {
            "field": "samples.value",
            "interval": 10
          }
        }
      }
    }
  }
}

And I'm querying http://localhost:9200/dc/sample/_search?search_type=count&pretty . But this returns counts of nested documents in samples array. But I need to count distinct unit.id per bucket...

Can you guys help me please?

Edit: added mapping

{
  "dc" : {
    "mappings" : {
      "sample" : {
        "unit" : {
          "properties" : {
            "name" : {
              "type" : "string"
            }}},
        "samples" : {
          "type" : "nested",
          "properties" : {
            "aggregation" : {
              "type" : "string"
            },
            "type" : {
              "properties" : {
                "display" : {
                  "type" : "string"
                },
                "name" : {
                  "type" : "string"
                }
              }
            },
            "value" : {
              "type" : "double"
            }
          }
        },
        "timestamp" : {
          "type" : "date",
          "format" : "strict_date_optional_time||epoch_millis"
        }}}}}
}

Edit I'll try to rephrase it...I want to get count of units per bucket defined by "histogram_samples_value". That means sum of this counts should be total number of units. And to test it I wrote a query which filters only one unit (many documents with different sample values) - all but one "histogram_samples_value" buckets should contain count=0 and one bucket should contain count = 1 .

3
  • Could you add your mappings? That'd make things easier. Commented Dec 16, 2015 at 15:16
  • Mapping added, if necessary I can probably even change document structure - I'm expecting to have up to 100 mil. documents in that index. Commented Dec 16, 2015 at 17:43
  • This seems better. Maybe I'm misunderstanding your question, but why do you use histogram aggregation? Your requirements do not seem to need it at all. Also, could you add minimum expected output? Commented Dec 17, 2015 at 9:08

1 Answer 1

1

I think you can get what you want with the reverse nested aggregation, like this:

POST /test_index/_search
{
   "size": 0,
   "aggs": {
      "nested_samples": {
         "nested": {
            "path": "samples"
         },
         "aggs": {
            "histogram_samples_value": {
               "histogram": {
                  "field": "samples.value",
                  "interval": 10
               },
               "aggs": {
                  "reverse_nested_doc": {
                     "reverse_nested": {},
                     "aggs": {
                        "terms_unit_id": {
                           "terms": {
                              "field": "unit.id"
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

Here is some code I used to test it:

http://sense.qbox.io/gist/e93dbddbbc4a841af5d9ce687a543a2914457d31

Sign up to request clarification or add additional context in comments.

2 Comments

hey, thanks for sample, but this query returns histogram buckets filled with buckets based on "unit.id" term and they are the same for every "histogram" bucket. It's not precisely what I wanted. I'll try to rephrase it...I want to get count of units per bucket defined by "histogram_samples_value". That means sum of this counts should be total number of units. And to test it I wrote a query which filters only one sample - all but one "histogram_samples_value" buckets should contain count=0 and one bucket should contain count = 1
I probably have to use filter aggregation with cardinality aggregation somehow.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.