How to Get All Results from Elasticsearch in Python

Question

I am brand new to using Elasticsearch and I'm having an issue getting all results back when I run an Elasticsearch query through my Python script. My goal is to query an index ("my_index" below), take those results, and put them into a pandas DataFrame which goes through a Django app and eventually ends up in a Word document.

My code is:

es = Elasticsearch()
logs_index = "my_index"
logs = es.search(index=logs_index,body=my_query)

and it tells me I have 72 hits, but then when I do:

df = logs['hits']['hits']
len(df)

It says the length is only 10. I saw someone had a similar issue on this question but their solution did not work for me.

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
logs_index = "my_index"
search = Search(using=es)
total = search.count()
search = search[0:total]
logs = es.search(index=logs_index,body=my_query)
len(logs['hits']['hits'])

The len function still says I only have 10 results. What am I doing wrong, or what else can I do to get all 72 results back?

ETA: I am aware that I can just add "size": 10000 to my query to stop it from truncating to just 10, but since the user will be entering their search query I need to find another way that isn't just in the search query.

can you please clarify your last edit? I'm not sure what the search query has to do with the size parameter. Are you referring to the problem of not knowing how many results are being returned by the query VS the static size you might define? — Alexandre Juma
– Alexandre Juma, Commented Dec 11, 2018 at 18:16
since it's your first post, please read this so you know how to react to answers: stackoverflow.com/help/someone-answers — Alexandre Juma
– Alexandre Juma, Commented Dec 11, 2018 at 18:29

Alexandre Juma · Accepted Answer · 2018-12-11 18:24:45Z

23

You need to pass a size parameter to your es.search() call.

Please read the API Docs

size – Number of hits to return (default: 10)

An example:

es.search(index=logs_index, body=my_query, size=1000)

Please note that this is not an optimal way to get all index documents or a query that returns a lot of documents. For that you should do a scroll operation which is also documented in the API Docs provided under the scan() abstraction for scroll Elastic Operation.

You can also read about it in elasticsearch documentation

edited Dec 11, 2018 at 18:24

answered Dec 11, 2018 at 18:14

Alexandre Juma

3,3532 gold badges27 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

carousallie Over a year ago

I thought the size could only be within my_query, thank you for clarifying! I know it's not the best practice, but I need it to just work for now and I can look into scroll later. Thank you!

Alexandre Juma Over a year ago

If for some reason you need to implement a basic and not adviseable client side scroll, you can also use the from parameter that defines the start of your result offset (which lets you paginate results basically).

Naresh Thakur Over a year ago

@AlexandreJuma is it possible to add from as I am trying to add it and my python is giving me SyntaxError probably because from is reserved keyword in python

Alexandre Juma Over a year ago

@thakurinbox, for compatibility with the Python ecosystem we use from_ instead of from and doc_type instead of type as parameter names

gabra · Accepted Answer · 2019-10-25 14:24:19Z

9

It is also possible to use the elasticsearch_dsl (link) library:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd

client = Elasticsearch()
s = Search(using=client, index="my_index")

df = pd.DataFrame([hit.to_dict() for hit in s.scan()])

The secret here is s.scan() which handles pagination and queries the entire index.

Note that the example above will return the entire index since it was not passed any query. To create a query with elasticsearch_dsl check this link.

answered Oct 25, 2019 at 14:24

gabra

10.8k4 gold badges32 silver badges46 bronze badges

Comments

ybl · Accepted Answer · 2025-06-09 16:55:07Z

3

Either you should set the size explicitly(if the number of documents is relatively small) or use the scan function to have a cursor like for large number of documents.

Scan

edited Jun 9 at 16:55

answered Dec 11, 2018 at 18:19

ybl

1,63512 silver badges17 bronze badges

Comments

Subhamay · Accepted Answer · 2021-10-06 20:28:36Z

This python script will help you to execute a combine queries to paginate in elastic search queries and export. #elasticsearch #python script.

from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers
from requests_aws4auth import AWS4Auth
import pandas as pd
# es = Elasticsearch(hosts=[AWSSEARCHURI], (access_key, secret_key))
# es.indices.exists(index="xxxx")

access_key='xxxxx'
secret_key='xxxxx'
region_name='xxxx'
AWSSEARCHURI='xxxxxx'
awsauth = AWS4Auth(access_key, secret_key, region_name, 'es')

es = Elasticsearch(
    hosts=[{'host': AWSSEARCHURI, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)


datas = []
loop=0
timestamp = None
while loop < 161:
    timestamp = timestamp
    print('timestamp:', timestamp)
    if timestamp == None:
        body = {
            "query": {
                "bool": {                    
                    "filter": [
                        {
                            "terms": {
                                "organization_industries.keyword": [
                                    "Crypto",
                                    "crypto",
                                    "Crypto Industry",
                                    "crypto industry"
                                ]
                            }
                        }
                    ]
                }
            },
            "sort": [
                {
                    "@timestamp": "desc"
                }
            ],
            "_source": [
                "contact_id",
                "person_name",                
                "person_email", 
                "type",               
                "@timestamp"
            ]
        }
    else:
        body = {
            "query": {
                "bool": {                    
                    "filter": [
                        {
                            "terms": {
                                 "organization_industries.keyword": [
                                    "Crypto",
                                    "crypto",
                                    "Crypto Industry",
                                    "crypto industry"
                                ]
                            }
                        }
                    ]
                }
            },
            "sort": [
                {
                    "@timestamp": "desc"
                }
            ],
            "search_after":[timestamp],
            "_source": [
                "contact_id",
                "person_name",
                "person_email", 
                "type",               
                "@timestamp"
            ]
        }
    res = es.search(body=body, size=10000, request_timeout=110)
    data = res['hits']['hits']
    print(data[-1]['_source']['@timestamp'])
    timestamp = data[-1]['_source']['@timestamp']
    loop += 1
    datas.extend(data)

csv_data = [dta['_source'] for dta in datas]


cs = pd.DataFrame(csv_data)

cs.to_csv('data_extractor_crypto_industry_06_10_2021.csv', index=False)

This python script will help you to execute a combine queries to paginate in elastic search queries and export. #elasticsearch #python script
elasticsearchPythonSearcher\main.py:112: DeprecationWarning: The 'body' parameter is deprecated for the 'search' API and will be removed in a future version. Instead use API parameters directly. See github.com/elastic/elasticsearch-py/issues/1698 for more information res = es.search(body=body, size=10000, request_timeout=110)

Collectives™ on Stack Overflow

How to Get All Results from Elasticsearch in Python

4 Answers 4

4 Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related