Bulk indexing raise read timeout error

Fool_LeoTao · May 17, 2015, 5:08am

When using bulk api to index with python client,it's ok at begin.But sooner an readtime error raised like the following:

bulk_index start processing...
1 chunk bulk index spend: 20.0
2 chunk bulk index spend: 17.0
3 chunk bulk index spend: 17.0
4 chunk bulk index spend: 18.0
5 chunk bulk index spend: 18.0
6 chunk bulk index spend: 21.0
7 chunk bulk index spend: 19.0
8 chunk bulk index spend: 20.0
Traceback (most recent call last):
  File "es_index.py", line 54, in <module>
    bulk_index()
  File "es_index.py", line 19, in _
    rv = func(*args, **kwargs)
  File "es_index.py", line 48, in bulk_index
    chunk_size=100000, timeout=30)
  File "../es/wrappers.py", line 81, in bulk
    for chunk_len, errors in streaming_bulk_index(client, actions, **kwargs):
  File "../es/wrappers.py", line 58, in streaming_bulk_index
    raise e
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'219.224.135.97', port=9200): Read timed out. (read timeout=10))

I don't understand:

Read timeout seems like a problem concerning query,but when bulk indexing,why a read timeout error raised?
I use es-1.5.2 and just make elasticsearch.yml the following config which means the left config just use default.By the way, ES_HEAP_SIZE is set to 5g.

index.number_of_shards: 5
index.number_of_replicas: 0
index.store.type: mmapfs
indices.memory.index_buffer_size: 30%
index.translog.flush_threshold_ops: 50000
refresh_interval: 60s

My python code is simple like that:

es = Elasticsearch()

def bulk_index():
    actions = doc_generator()
    res = bulk(es, actions, index='test', doc_type='test',
               expand_action_callback=expand_action,
               chunk_size=100000, timeout=30)
    print 'res: ', res

jprante · May 17, 2015, 8:05am

You get read timeouts from the server because the client is misbehaving. Cluster power, chunk size, timeout length and API use are not harmonized.

You do not let finish the indexing in 30 seconds, one reason is, the chunk is too large
You do not evaluate the bulk responses before continuing

Use a smaller chunk_size like 1000 und most important for convenient API usage, use https://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.bulk for evaluating the number of successfully indexed documents before you continue.

spuder · May 31, 2015, 4:39am

Possibly related https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/141#issuecomment-107113994

fool_01 · January 21, 2016, 4:47am

I faced the same issue and finally the issue got resolved by the use of request_timeout parameter instead of timeout.

So the call must be like this helpers.bulk(es,actions,chunk_size=some_value,request_timeout=some_value)

Topic		Replies	Views
Issue in Bulk indexing with Elasticsearch Python Client Elasticsearch	3	1890	July 6, 2017
Elasticsearch read timed out Elasticsearch	2	7711	July 5, 2017
Python connection timing out Elasticsearch	2	4486	July 5, 2017
Index command raises timeout error Elasticsearch language-clients	2	3285	August 25, 2021
Read timeout error after setting the request_timeout Elasticsearch	4	15080	May 18, 2018

Bulk indexing raise read timeout error

Related topics