Bulk indexing raise read timeout error


(Leo Tao) #1

When using bulk api to index with python client,it's ok at begin.But sooner an readtime error raised like the following:

bulk_index start processing...
1 chunk bulk index spend: 20.0
2 chunk bulk index spend: 17.0
3 chunk bulk index spend: 17.0
4 chunk bulk index spend: 18.0
5 chunk bulk index spend: 18.0
6 chunk bulk index spend: 21.0
7 chunk bulk index spend: 19.0
8 chunk bulk index spend: 20.0
Traceback (most recent call last):
  File "es_index.py", line 54, in <module>
    bulk_index()
  File "es_index.py", line 19, in _
    rv = func(*args, **kwargs)
  File "es_index.py", line 48, in bulk_index
    chunk_size=100000, timeout=30)
  File "../es/wrappers.py", line 81, in bulk
    for chunk_len, errors in streaming_bulk_index(client, actions, **kwargs):
  File "../es/wrappers.py", line 58, in streaming_bulk_index
    raise e
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'219.224.135.97', port=9200): Read timed out. (read timeout=10))

I don't understand:

  1. Read timeout seems like a problem concerning query,but when bulk indexing,why a read timeout error raised?
  2. I use es-1.5.2 and just make elasticsearch.yml the following config which means the left config just use default.By the way, ES_HEAP_SIZE is set to 5g.
index.number_of_shards: 5
index.number_of_replicas: 0
index.store.type: mmapfs
indices.memory.index_buffer_size: 30%
index.translog.flush_threshold_ops: 50000
refresh_interval: 60s

My python code is simple like that:

es = Elasticsearch()

def bulk_index():
    actions = doc_generator()
    res = bulk(es, actions, index='test', doc_type='test',
               expand_action_callback=expand_action,
               chunk_size=100000, timeout=30)
    print 'res: ', res

(Jörg Prante) #2

You get read timeouts from the server because the client is misbehaving. Cluster power, chunk size, timeout length and API use are not harmonized.

  1. You do not let finish the indexing in 30 seconds, one reason is, the chunk is too large

  2. You do not evaluate the bulk responses before continuing

Use a smaller chunk_size like 1000 und most important for convenient API usage, use https://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.bulk for evaluating the number of successfully indexed documents before you continue.


(Spuder) #3

Possibly related https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/141#issuecomment-107113994


(Dwarakgovind Parthiban) #4

I faced the same issue and finally the issue got resolved by the use of request_timeout parameter instead of timeout.

So the call must be like this helpers.bulk(es,actions,chunk_size=some_value,request_timeout=some_value)


(system) #5