9

I am using elasticsearch-py for elasticsearch operation.

I am trying for elasticsearch.helpers.bulk to create or update multiple records.

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)

Is there any way to perform this operation?

Now we can give only _op_type as create or update. If we give update and record is not exist, then it will raise error.

Traceback (most recent call last):
  File "/tmp/test.py", line 37, in <module>
    print helpers.bulk(es, data)
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 155, in streaming_bulk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{u'update': {u'status': 404, u'_type': u'external', u'_id': u'3', u'error': u'DocumentMissingException[[customer][-1] [external][3]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'4', u'error': u'DocumentMissingException[[customer][-1] [external][4]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'5', u'error': u'DocumentMissingException[[customer][-1] [external][5]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'6', u'error': u'DocumentMissingException[[customer][-1] [external][6]: document missing]', u'_index': u'customer'}}])
6
  • 1
    have you tried using index as op_type instead of create and update ? Commented Aug 21, 2015 at 6:16
  • @Val, as per helpers.bulk document, we have to give index, I also tried your solution, its give ValidationError, elasticsearch.exceptions.TransportError: TransportError(500, u'ActionRequestValidationException[Validation Failed: 1: no requests added;]') Commented Aug 21, 2015 at 6:22
  • That's weird... You're sure you have "_op_type": "index"? Commented Aug 21, 2015 at 6:29
  • You can check docs for this method elasticsearch-py.readthedocs.org/en/master/… Commented Aug 21, 2015 at 6:30
  • 1
    Also have you tried without specifying _op_type at all, I think it will default to index by itself. Commented Aug 21, 2015 at 6:32

2 Answers 2

10

According to the _bulk endpoint documentation, you can and should use the index action for this, provided your documents always have the same identifiers.

create is useful when creating documents the first time, and update is more meant for doing partial and/or scripted updates.

You can also not specify any _op_type at all and index will be taken by default.

Sign up to request clarification or add additional context in comments.

Comments

5

I tried solution suggested by @Val and it works as charm.

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.