When using bulk api to index with python client,it's ok at begin.But sooner an readtime error raised like the following:
bulk_index start processing... 1 chunk bulk index spend: 20.0 2 chunk bulk index spend: 17.0 3 chunk bulk index spend: 17.0 4 chunk bulk index spend: 18.0 5 chunk bulk index spend: 18.0 6 chunk bulk index spend: 21.0 7 chunk bulk index spend: 19.0 8 chunk bulk index spend: 20.0 Traceback (most recent call last): File "es_index.py", line 54, in <module> bulk_index() File "es_index.py", line 19, in _ rv = func(*args, **kwargs) File "es_index.py", line 48, in bulk_index chunk_size=100000, timeout=30) File "../es/wrappers.py", line 81, in bulk for chunk_len, errors in streaming_bulk_index(client, actions, **kwargs): File "../es/wrappers.py", line 58, in streaming_bulk_index raise e elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'126.96.36.199', port=9200): Read timed out. (read timeout=10))
I don't understand:
- Read timeout seems like a problem concerning query,but when bulk indexing,why a read timeout error raised?
- I use es-1.5.2 and just make elasticsearch.yml the following config which means the left config just use default.By the way,
ES_HEAP_SIZEis set to 5g.
index.number_of_shards: 5 index.number_of_replicas: 0 index.store.type: mmapfs indices.memory.index_buffer_size: 30% index.translog.flush_threshold_ops: 50000 refresh_interval: 60s
My python code is simple like that:
es = Elasticsearch() def bulk_index(): actions = doc_generator() res = bulk(es, actions, index='test', doc_type='test', expand_action_callback=expand_action, chunk_size=100000, timeout=30) print 'res: ', res