8

I have a bunch of JSON files(100), which are named as merged_file 1.json, merged_file 2. json and so on.

How do I index all these files into elasticsearch using python(elasticsearch_dsl) ?

I am using this code, but it doesn't seem to work:

from elasticsearch_dsl import Elasticsearch
import json
import os
import sys

es = Elasticsearch()

json_docs =[]

directory = sys.argv[1]

for filename in os.listdir(directory):
    if filename.endswith('.json'):
        with open(filename,'r') as open_file:
            json_docs.append(json.load(open_file))

es.bulk("index_name", "type_name", json_docs)

The JSON looks like this:

{"one":["some data"],"two":["some other data"],"three":["other data"]}

What can I do to make this correct ?

3
  • can you show how jsondocs looks like? Commented May 15, 2017 at 13:52
  • You're missing the command line before each document. See here for more details. Commented May 15, 2017 at 13:53
  • @BhargaviSri - Added Commented May 15, 2017 at 15:06

1 Answer 1

12

For this task you should be using elasticsearch-py (pip install elasticsearch):

from elasticsearch import Elasticsearch, helpers
import sys, json

es = Elasticsearch()

def load_json(directory):
    " Use a generator, no need to load all in memory"
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='my-index', doc_type='my-type')
Sign up to request clarification or add additional context in comments.

3 Comments

How do I get the id's of the jsons that are indexed ?
If you care about the ids (elasticsearch will create random ones for you otherwise) just have an _id field in you json either directly or maybe put the filename there or something
This throws error in the action parameter of bulk. """ ~\Anaconda3\lib\site-packages\elasticsearch\helpers\actions.py in expand_action(data) 25 # make sure we don't alter the action 26 data = data.copy() ---> 27 op_type = data.pop("_op_type", "index") 28 action = {op_type: {}} 29 for key in ( TypeError: pop() takes at most 1 argument (2 given)"""

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.