3

I'm trying to reindex using the Elasticsearch python client, using https://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.reindex. But I keep getting the following exception: elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeout

The stacktrace of the error is

Traceback (most recent call last):
  File "~/es_test.py", line 33, in <module>
    main()
  File "~/es_test.py", line 30, in main
    target_index='users-2')
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 306, in reindex
    chunk_size=chunk_size, **kwargs)
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 124, in streaming_bulk
    raise e
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeout(HTTPSConnectionPool(host='myhost', port=9243): Read timed out. (read timeout=10))

Is there anyway to prevent this exception besides increasing the timeout?

EDIT: python code

from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers

es = Elasticsearch(connection_class=RequestsHttpConnection,
                   host='myhost',
                   port=9243,
                   http_auth=HTTPBasicAuth(username, password),
                   use_ssl=True,
                   verify_certs=True,
                   timeout=600)
helpers.reindex(es, source_index=old_index, target_index=new_index)
4
  • Can you show you python code? Commented Jul 23, 2015 at 3:20
  • @Val I included my code Commented Jul 25, 2015 at 17:53
  • Can you try to add the chunk_size parameter (maybe with value = 100) in the reindex call? Commented Jul 26, 2015 at 3:22
  • Use the chunk_size and you should be fine. I have been able to reindex millions of documents using a simple reindex call. Example: helpers.reindex(es, source_index=old_index, target_index=new_index, chunk_size=1000) Commented Apr 25, 2017 at 16:07

2 Answers 2

1

I have been suffering from this issue for couple of days, I changed the request_timeout parameter to 30 (which is 30 seconds) didn't work. Finally I have to edit the stream_bulk and reindex APIs inside the elasticsearch.py

Change the chunk_size parameter from the default 500 (which is processing 500 documents) to less number of documents per batch. I changed mine to 50 which worked fine for me. No more read timeout errors.

def streaming_bulk(client, actions, chunk_size=50, raise_on_error=True, expand_action_callback=expand_action, raise_on_exception=True, **kwargs):

def reindex(client, source_index, target_index, query=None, target_client=None, chunk_size=50, scroll='5m', scan_kwargs={}, bulk_kwargs={}):

Sign up to request clarification or add additional context in comments.

1 Comment

where can I find the elasticsearch,py?
0

It may be happening because of the OutOfMemoryError for Java heap space, which means you are not giving elasticsearch enough memory for what you want to do. Try to look at your /var/log/elasticsearch if there is any exception like that.

https://github.com/elastic/elasticsearch/issues/2636

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.