0

I am trying to use the Python ElasticSearch library to read in elasticsearch documents and place them in a spark dataframe. I am able to connect and query using the scan helper function since the query will generate about 2M documents(rows in my dataframe). The issue I am running into is getting the query into a spark dataframe.

This code produces a generator:

result = elasticsearch.helpers.scan(es, index=index, doc_type='_doc', query=query)

I was trying to use a for loop to fill a to collect the generated data into a dictionary:

data = {}
for item in result:
  data.append((item['_source']['someField'], item['_source']['someField']))
return data

but I run into errors as I do not think a dictionary can append in this way.

Is there a better way to collect this generated data into a spark dataframe? Note:I am also working on the Databricks platform if that helps.

3

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.