Bulk Index data in Elasticsearch with sequential IDs

Question

I am using this code to bulk index all data in Elasticsearch using python:

from elasticsearch import Elasticsearch, helpers
import json
import os
import sys
import sys, json

es = Elasticsearch()   

def load_json(directory):
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='v1_resume', doc_type='candidate')

I know that if ID is not mentioned ES gives a 20 character long ID by itself, but I want it to get indexed starting from ID = 1 till the number of documents.

How can I achieve this ?

why don't you pass ID inside your schema in load_json and build your objects with whatever ID you like? — mrkzq
– mrkzq, Commented May 16, 2017 at 12:16

mrkzq · Accepted Answer · 2017-05-16 12:34:37Z

In elastic search if you don't pick and ID for your document an ID is automatically created for you, check here in elastic docs:

Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID 
strings. These GUIDs are generated from a modified FlakeID scheme which 
allows multiple nodes to be generating unique IDs in parallel with 
essentially zero chance of collision.

If you like to have custom ids you need to build them yourself, using similar syntax:

[
    {'_id': 1,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}

    },
    {'_id': 2,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}
    }
]

helpers.bulk(es, load_json(sys.argv[1])

Since you are decalring the type and index inside your schema you don't have to do it inside helpers.bulk() method. You need to change the output of 'load_json' to create list with dicts (like above) to be saved in es (python elastic client docs)

Collectives™ on Stack Overflow

Bulk Index data in Elasticsearch with sequential IDs

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related