Elasticsearch Python client Reindex Timedout

Question

I'm trying to reindex using the Elasticsearch python client, using https://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.reindex. But I keep getting the following exception: elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeout

The stacktrace of the error is

Traceback (most recent call last):
  File "~/es_test.py", line 33, in <module>
    main()
  File "~/es_test.py", line 30, in main
    target_index='users-2')
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 306, in reindex
    chunk_size=chunk_size, **kwargs)
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 124, in streaming_bulk
    raise e
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeout(HTTPSConnectionPool(host='myhost', port=9243): Read timed out. (read timeout=10))

Is there anyway to prevent this exception besides increasing the timeout?

EDIT: python code

from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers

es = Elasticsearch(connection_class=RequestsHttpConnection,
                   host='myhost',
                   port=9243,
                   http_auth=HTTPBasicAuth(username, password),
                   use_ssl=True,
                   verify_certs=True,
                   timeout=600)
helpers.reindex(es, source_index=old_index, target_index=new_index)

Can you try to add the chunk_size parameter (maybe with value = 100) in the reindex call? — Val
– Val, Commented Jul 26, 2015 at 3:22
Use the chunk_size and you should be fine. I have been able to reindex millions of documents using a simple reindex call. Example: helpers.reindex(es, source_index=old_index, target_index=new_index, chunk_size=1000) — Manthan Thakar
– Manthan Thakar, Commented Apr 25, 2017 at 16:07

JameSQL · Accepted Answer · 2016-09-20 18:48:20Z

1

I have been suffering from this issue for couple of days, I changed the request_timeout parameter to 30 (which is 30 seconds) didn't work. Finally I have to edit the stream_bulk and reindex APIs inside the elasticsearch.py

Change the chunk_size parameter from the default 500 (which is processing 500 documents) to less number of documents per batch. I changed mine to 50 which worked fine for me. No more read timeout errors.

def streaming_bulk(client, actions, chunk_size=50, raise_on_error=True, expand_action_callback=expand_action, raise_on_exception=True, **kwargs):

def reindex(client, source_index, target_index, query=None, target_client=None, chunk_size=50, scroll='5m', scan_kwargs={}, bulk_kwargs={}):

answered Sep 20, 2016 at 18:48

JameSQL

891 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Radio Controlled Over a year ago

where can I find the elasticsearch,py?

Filipe Spindola · Accepted Answer · 2016-04-13 14:30:35Z

0

It may be happening because of the OutOfMemoryError for Java heap space, which means you are not giving elasticsearch enough memory for what you want to do. Try to look at your /var/log/elasticsearch if there is any exception like that.

https://github.com/elastic/elasticsearch/issues/2636

answered Apr 13, 2016 at 14:30

Filipe Spindola

1,9752 gold badges15 silver badges17 bronze badges

Collectives™ on Stack Overflow

Elasticsearch Python client Reindex Timedout

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related