I have an index that contains documents structured as follows:
{
"year": 2020,
"month": 10,
"day": 05,
"some_other_data": { ... }
}
the ID of each documents is constructed based on the date and some additional data from some_other_data document, like this: _id: "20201005_some_other_unique_data". There is no explicit _timestamp on the documents.
I can easily get the most recent additions by doing the following query:
{
"query": {
"match_all": {}
},
"sort": [
{"_uid": "desc"}
]
}
Now, the question is: how do I get documents that have essentially a date between day A and day B, where A is, for instance, 2020-07-12 and B is 2020-09-11. You can assume that the input date can be either integers, strings, or anything really as I can manipulate it beforehand.
edit: As requested, I'm including a sample result from the following query:
{
"size": 4,
"query": {
"match": {
"month": 7
}
},
"sort": [
{"_uid": "asc"}
]
}
The response:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1609,
"max_score": null,
"hits": [
{
"_index": "my_index",
"_type": "nested",
"_id": "20200703_andromeda_cryptic",
"_score": null,
"_source": {
"year": 2020,
"month": 7,
"day": 3,
"yara": {
"strain": "Andromeda",
},
"parent_yara": {
"strain": "CrypticMut",
},
},
"sort": [
"nested#20200703_andromeda_cryptic"
]
},
{
"_index": "my_index",
"_type": "nested",
"_id": "20200703_betabot_boaxxe",
"_score": null,
"_source": {
"year": 2020,
"month": 7,
"day": 3,
"yara": {
"strain": "BetaBot",
},
"parent_yara": {
"strain": "Boaxxe",
},
},
"sort": [
"nested#20200703_betabot_boaxxe"
]
},
{
"_index": "my_index",
"_type": "nested",
"_id": "20200703_darkcomet_zorex",
"_score": null,
"_source": {
"year": 2020,
"month": 7,
"day": 3,
"yara": {
"strain": "DarkComet",
},
"parent_yara": {
"strain": "Zorex",
},
},
"sort": [
"nested#20200703_darkcomet_zorex"
]
},
{
"_index": "my_index",
"_type": "nested",
"_id": "20200703_darktrack_fake_template",
"_score": null,
"_source": {
"year": 2020,
"month": 7,
"day": 3,
"yara": {
"strain": "Darktrack",
},
"parent_yara": {
"strain": "CrypticFakeTempl",
},
},
"sort": [
"nested#20200703_darktrack_fake_template"
]
}
]
}
}
The above-mentioned query will return all documents that have matched the month. So basically anything that was put there in July of any year. What I want to achieve, if at all possible, is getting all documents inserted after a certain date and before another certain date.
Unfortunately, I cannot migrate the data so that it has a timestamp or otherwise nicely sortable fields. Essentially, I need to figure out a logic that will say: give me all documents inserted after july 1st, and before august 2nd. The problem here, is that there are plenty of edge cases, like how to do it when start date and end date are in different years, different months, and so on.
edit: I have solved it using the painless scripting, as suggested by Briomkez, with small changes to the script itself, as follows:
getQueryForRange(dateFrom: String, dateTo: String, querySize: Number) {
let script = `
DateTimeFormatter formatter = new DateTimeFormatterBuilder().appendPattern("yyyy-MM-dd")
.parseDefaulting(ChronoField.NANO_OF_DAY, 0)
.toFormatter()
.withZone(ZoneId.of("Z"));
ZonedDateTime l = ZonedDateTime.parse(params.l, formatter);
ZonedDateTime h = ZonedDateTime.parse(params.h, formatter);
ZonedDateTime x = ZonedDateTime.of(doc['year'].value.intValue(), doc['month'].value.intValue(), doc['day'].value.intValue(), 0, 0, 0, 0, ZoneId.of('Z'));
ZonedDateTime first = l.isAfter(h) ? h : l;
ZonedDateTime last = first.equals(l) ? h : l;
return (x.isAfter(first) || x.equals(first)) && (x.equals(last) || x.isBefore(last));
`
return {
size: querySize,
query: {
bool: {
filter: {
script: {
script: {
source: script,
lang: "painless",
params: {
l: dateFrom,
h: dateTo,
},
},
},
},
},
},
sort: [{ _uid: "asc" }],
}
}
With these changes, the query works well for my version of Elasticsearch (7.2) and the order of dates in not important.