1

I'm using Elasticsearch API and the schema of the document as follow

{
  name: "",
  born_year: "",
  born_month: "",
  born_day: "",
  book_type: "",
  price: <some number>,
  country: ""
}

Now what I need is to get the document count per each name where born before 1995 (born_year + born_month + born_day < "20051220"). How can i achieve?

I tried this:

{
  "query": {
    "query_string": {
      "query": "country:\"SL\""
    }
  },
  "size": 0,
  "aggs": {
    "total": {
      "terms": {
        "field": "name"
      }
    }
  }
}

But I have no idea how can I add filter for the birthday.

6
  • Your schema is not ideal, you're missing a real date field. You'll need a script query to reconstruct the date in order to compare it and depending on your document base, it might make the whole thing slow. Can you change the schema? Commented Dec 28, 2016 at 7:01
  • @Val, It is not possible. Commented Dec 28, 2016 at 8:31
  • Which version of ES are you using? How are you sending your documents into ES? Commented Dec 28, 2016 at 8:37
  • @Val, version is 5.0, The dcuments are already there and i have no authority to add of modify the available documents in ES. I have to work with the available documents. Commented Dec 28, 2016 at 8:51
  • It's like asking someone to swim with the hands tied in the back ;-) Are you allowed to reindex the documents into another index? Commented Dec 28, 2016 at 9:22

1 Answer 1

1

As mentioned by @val, you need to add a real date field that you can easily add by concatenating these three fields at creation time. But how you filter based on date range, there are two ways and both of them will return different result sets Now the level of filtering is your choice.

You mentioned querying on country field. But you have not mentioned at what level you want to filter on date range. I will give you queries for both the cases.

Mappings- assuming you create a date field.

{
    name:"",
    born_year:"",
    born_month:"",
    born_day:"",
    book_type:"",
    price:<some number>,
    country:"",
    date : ""
  }

Case - 1) Filtering date range for name aggregations only, here documents count will not be effected by the date range filter

{
    "query": {
        "query_string": {
            "query": "country:\"SL\""
        }
    },
    "aggs": {
        "total": {
            "filter": {
                "range": {
                    "date": {
                        "gte": "your_date_mx",
                        "lte": "your_date_min"
                    }
                }
            },
            "aggs": {
                "NAME": {
                    "terms": {
                        "field": "name",
                        "size": 10
                    }
                }
            }
        }
    }
}

Case 2) In this case both your documents count and aggregation will be filtered for date range as we add date range filter at query level.

{
    "query": {
        "query_string": {
            "query": "country:\"SL\""
        },
        "bool": {
            "must": [
                {
                    "range": {
                        "date": {
                            "gte": "your_date_mx",
                            "lte": "your_date_mic"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "toal": {
            "terms": {
                "field": "name",
                "size": 10
            }
        }
    }
}

So adding a filter to aggregation will effect only aggs count. Edit - Approach1) with groovy script try to concatinate the string and parse it to integer and then compare with your input date.

{
    "query": {
        "bool": {
            "must": [
                {}
            ],
            "filter": {
                "script": {
                    "script": {
                        "inline": "(doc['year'].value  +  doc['month'].value + doc['date'].value).toInteger() > 19910701",
                        "params": {
                            "param1": 19911122
                        }
                    }
                }
            }
        }
    }
}

Make sure when indexing index date(or month) with single digit like 6 as 06

2) Approach 2 - parse the string the exact date(preferred)

{
    "query": {
        "bool": {
            "must": [
                {}
            ],
            "filter": {
                "script": {
                    "script": {
                        "inline": "Date.parse('dd-MM-yyyy',doc['date'].value  +'-'+  doc['month'].value +'-'+ doc['year'].value).format('dd-MM-yyyy') > param1",
                        "params": {
                            "param1": "04-05-1991"
                        }
                    }
                }
            }
        }
    }
}

Second approach is much better approach as you don't have to worry about the maintaing the string for each field(date, month, day) to later parse to proper int for comparing.

Sign up to request clarification or add additional context in comments.

4 Comments

Adding date field is not possible. Thanks.
OK, then you can concatinate the born_year, born_month, born_day in the grrovy/scala script filter, parse it to integer value and then compare this with your date ranges.
How can i do that. could you please modify the answer?
i have edited my answer. You have to enable inline groovy scripts for elasticsearch, by default inline scripting is turned off in elastic for security reasons. Or you can add script files to server and execute them in query.elastic.co/guide/en/elasticsearch/reference/2.3/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.