ElasticSearch query to pandas dataframe

Question

I have a query:

s = Search(using=client, index='myindex', doc_type='mytype')
s.query = Q('bool', must=[Q('match', BusinessUnit=bunit),
                          Q('range', **dicdate)])

res = s.execute()

return me 627033 lines, I want to convert this dictionary in a dataframe with 627033 lines

Can you give more information about the output of ElasticSearch query? If it is simply dictionary, the question should be converting dictionary to dataframe. There are many answers on this for example stackoverflow.com/questions/34589332/… — Nelson Dinh
– Nelson Dinh, Commented Sep 28, 2017 at 15:11
actually is not the format of a dictionary that i am searching for, but it always return only 10 elements i want all of them — Náthali
– Náthali, Commented Sep 28, 2017 at 16:55

Phil B · Accepted Answer · 2019-05-11 20:54:25Z

3

If your request is likely to return more than 10,000 documents from Elasticsearch, you will need to use the scrolling function of Elasticsearch. Documentation and examples for this function are rather difficult to find, so I will provide you with a full, working example:

import pandas as pd
from elasticsearch import Elasticsearch
import elasticsearch.helpers


es = Elasticsearch('127.0.0.1',
        http_auth=('my_username', 'my_password'),
        port=9200)

body={"query": {"match_all": {}}}
results = elasticsearch.helpers.scan(es, query=body, index="my_index")
df = pd.DataFrame.from_dict([document['_source'] for document in results])

Simply edit the fields that start with "my_" to correspond to your own values

answered May 11, 2019 at 20:54

Phil B

6,1779 gold badges46 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

a mark · Accepted Answer · 2017-12-12 19:09:22Z

2

Based on your comment I think what you're looking for is size:

es.search(index="my-index", doc_type="mydocs", body="your search", size="1000")

I'm not sure if this will work for 627,033 lines -- you might need scroll for that.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

answered Dec 12, 2017 at 19:09

a mark

951 silver badge9 bronze badges

Comments

Ardent Coder · Accepted Answer · 2021-08-26 17:50:10Z

0

I found the solution by Phil B a good template for my situation. However, all results are returned as lists, rather than atomic data types. To get around this, I added the following helper function and code:

def flat_data(val):
  if isinstance(val):
    return val[0]
  else:
    return val

df = pd.DataFrame.from_dict([{k:flat_data(v) for (k,v) in document(['fields'].items()} 
                            for document in results])

edited Aug 26, 2021 at 17:50

Ardent Coder

4,1159 gold badges35 silver badges62 bronze badges

answered Aug 26, 2021 at 17:08

DSJ529

11 bronze badge

Collectives™ on Stack Overflow

ElasticSearch query to pandas dataframe

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related