1

I am a little confused with the results. I have a simple query to get the latest document added (based on sorted created date or timestamp):

query = {
            "query": {"match_all": {}},
            "sort": [
                {"created_date":  "desc"}
            ],
            "size": 1
        }

When I use helpers.scan() abstraction over Scroll() API. I get a hit which is different each time (inconsistent). My Elastic cluster is static (no new data points are being added) and the inconsistency in response is strange as I have sorted all entries and asked to return the the first hit (size 1) in my query. What am I missing here ?

1 Answer 1

4

For future references to people who stumble upon this. The documentation on the ElasticSearch homepage may not clarify doubts here but the python driver has a very good documentation. As per helpers.scan():

By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use preserve_order=True. This may be an expensive operation and will negate the performance benefits of using scan

So, for use cases like this, it is better to use search() than scan()

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.