29

How can I use a filter in connection with an aggregate in elasticsearch?

The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.

Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:

{
  "filter": {
    "and": [
      {
        "term": {
          "_type": "logs"
        }
      },
      {
        "term": {
          "dc": "eu-west-12"
        }
      },
      {
        "term": {
          "status": "204"
        }
      },
      {
        "range": {
          "@timestamp": {
            "from": 1398169707,
            "to": 1400761707
          }
        }
      }
    ]
  },
  "size": 0,
  "aggs": {
    "time_histo": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h"
      },
      "aggs": {
        "name": {
          "percentiles": {
            "field": "upstream_response_time",
            "percents": [
              98.0
            ]
          }
        }
      }
    }
  }
}

Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.

Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.

4 Answers 4

37

I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.

I also use bool filter instead of and as recommended by @alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

My final implementation:

{
  "aggs": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "_type": "logs"
              }
            },
            {
              "term": {
                "dc": "eu-west-12"
              }
            },
            {
              "term": {
                "status": "204"
              }
            },
            {
              "range": {
                "@timestamp": {
                  "from": 1398176502000,
                  "to": 1400768502000
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "time_histo": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "1h"
          },
          "aggs": {
            "name": {
              "percentiles": {
                "field": "upstream_response_time",
                "percents": [
                  98.0
                ]
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}
Sign up to request clarification or add additional context in comments.

2 Comments

you're possibly my favourite person right now. Have been battling with this for hours.
In this solution top aggr field is named "filtered", and that should not be mixed with elastic.co/guide/en/elasticsearch/reference/current/…, so please use some other name (e.g. "aggresults") - under that name you will get results in response. Please check reference: elastic.co/guide/en/elasticsearch/reference/master/… and answer stackoverflow.com/a/24823895/565525.
8

Put your filter in a filtered-query.

The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.

Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

Comments

4

more on @geekQ 's answer: to support filter string with space char,for multipal term search,use below:

{   "aggs": {
    "aggresults": {
      "filter": {
        "bool": {
          "must": [
            {
              "match_phrase": {
                "term_1": "some text with space 1"
              }
            },
            {
              "match_phrase": {
                "term_2": "some text with also space 2"
              }
            }
          ]
        }
      },
      "aggs" : {
            "all_term_3s" : {
                "terms" : {
                    "field":"term_3.keyword",
                    "size" : 10000,
                    "order" : {
                        "_term" : "asc" 
                    }
                }
           }
        }
    }   },   "size": 0 }

Comments

3

Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:

POST movies/_search?size=0
{
  "size": 0,
  "aggs": {
    "test": {
      "filter": {
        "bool": {
          "must": {
            "term": {
              "genre": "action"
            }
          },
          "filter": {
            "range": {
              "year": {
                "gte": 1800,
                "lte": 3000
              }
            }
          }
        }
      },
      "aggs": {
        "year_hist": {
          "histogram": {
            "field": "year",
            "interval": 50
          }
        }
      }
    }
  }
}

2 Comments

Is there a way to enter 2 terms in filter? so inside must.term I want to add "term":{"genre":"action", "country":"USA"} something like this.
@makewhite it seems should with proper minimum_should_match configured could help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.