I'm currently looping through 29,000 documents, in each document I add a sub-doc to a nested field and update. To manage the amount of data I'm dealing with, I'm breaking the loops into groups of 10,000 and using the ES size and from options to control where each iteration should start from. So, once the first 10,000 is updated, I do another query to fetch the next 10,000 and so on... The problem is every time I get to the second group there are a handful of docs in the batch that were already processed in the first 10,000 and when I get to the third batch it's all documents that have already been processed when it should be fetching docs from the 20,000 to 29,000 range.
It seems like I'm in some sort of race condition since doing a sort or a query by version number achieves nothing. I've also tried flushing and refreshing between queries and still no luck.
Has anyone had a similar issue?