Elasticsearch returning same results at different FROM values

Question

I'm currently looping through 29,000 documents, in each document I add a sub-doc to a nested field and update. To manage the amount of data I'm dealing with, I'm breaking the loops into groups of 10,000 and using the ES size and from options to control where each iteration should start from. So, once the first 10,000 is updated, I do another query to fetch the next 10,000 and so on... The problem is every time I get to the second group there are a handful of docs in the batch that were already processed in the first 10,000 and when I get to the third batch it's all documents that have already been processed when it should be fetching docs from the 20,000 to 29,000 range.

It seems like I'm in some sort of race condition since doing a sort or a query by version number achieves nothing. I've also tried flushing and refreshing between queries and still no luck.

Has anyone had a similar issue?

jhilden · Accepted Answer · 2015-06-29 23:03:15Z

2

In ElasticSearch there is up to a 1 second lag between when something is written and when it is available for reading. You can easily create a test to verify this, insert record with id 1, immediately try to read id 1, you'll get back null.

What you want to do is use a "SCROLL SCAN" in ES. When using a scroll it keeps track of what records it's given you back already so that when you request back out the next 10,000 you're guaranteed not to get any duplicates.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan

Note: when you specify the size of your scroll scan the size you specify is per shard. So if you want back chunks of 10,000 you need to specify size = 10,000/# number of shards

answered Jun 29, 2015 at 23:03

jhilden

12.5k5 gold badges56 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Elasticsearch returning same results at different FROM values

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related