2

I have a query:

s = Search(using=client, index='myindex', doc_type='mytype')
s.query = Q('bool', must=[Q('match', BusinessUnit=bunit),
                          Q('range', **dicdate)])

res = s.execute()

return me 627033 lines, I want to convert this dictionary in a dataframe with 627033 lines

2
  • Can you give more information about the output of ElasticSearch query? If it is simply dictionary, the question should be converting dictionary to dataframe. There are many answers on this for example stackoverflow.com/questions/34589332/… Commented Sep 28, 2017 at 15:11
  • actually is not the format of a dictionary that i am searching for, but it always return only 10 elements i want all of them Commented Sep 28, 2017 at 16:55

3 Answers 3

3

If your request is likely to return more than 10,000 documents from Elasticsearch, you will need to use the scrolling function of Elasticsearch. Documentation and examples for this function are rather difficult to find, so I will provide you with a full, working example:

import pandas as pd
from elasticsearch import Elasticsearch
import elasticsearch.helpers


es = Elasticsearch('127.0.0.1',
        http_auth=('my_username', 'my_password'),
        port=9200)

body={"query": {"match_all": {}}}
results = elasticsearch.helpers.scan(es, query=body, index="my_index")
df = pd.DataFrame.from_dict([document['_source'] for document in results])

Simply edit the fields that start with "my_" to correspond to your own values

Sign up to request clarification or add additional context in comments.

Comments

2

Based on your comment I think what you're looking for is size:

es.search(index="my-index", doc_type="mydocs", body="your search", size="1000")

I'm not sure if this will work for 627,033 lines -- you might need scroll for that.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

Comments

0

I found the solution by Phil B a good template for my situation. However, all results are returned as lists, rather than atomic data types. To get around this, I added the following helper function and code:

def flat_data(val):
  if isinstance(val):
    return val[0]
  else:
    return val
df = pd.DataFrame.from_dict([{k:flat_data(v) for (k,v) in document(['fields'].items()} 
                            for document in results])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.