When using bulk api to index with python client,it's ok at begin.But sooner an readtime error raised like the following:
bulk_index start processing...
1 chunk bulk index spend: 20.0
2 chunk bulk index spend: 17.0
3 chunk bulk index spend: 17.0
4 chunk bulk index spend: 18.0
5 chunk bulk index spend: 18.0
6 chunk bulk index spend: 21.0
7 chunk bulk index spend: 19.0
8 chunk bulk index spend: 20.0
Traceback (most recent call last):
File "es_index.py", line 54, in <module>
bulk_index()
File "es_index.py", line 19, in _
rv = func(*args, **kwargs)
File "es_index.py", line 48, in bulk_index
chunk_size=100000, timeout=30)
File "../es/wrappers.py", line 81, in bulk
for chunk_len, errors in streaming_bulk_index(client, actions, **kwargs):
File "../es/wrappers.py", line 58, in streaming_bulk_index
raise e
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'219.224.135.97', port=9200): Read timed out. (read timeout=10))
I don't understand:
- Read timeout seems like a problem concerning query,but when bulk indexing,why a read timeout error raised?
- I use es-1.5.2 and just make elasticsearch.yml the following config which means the left config just use default.By the way,
ES_HEAP_SIZE
is set to 5g.
index.number_of_shards: 5
index.number_of_replicas: 0
index.store.type: mmapfs
indices.memory.index_buffer_size: 30%
index.translog.flush_threshold_ops: 50000
refresh_interval: 60s
My python code is simple like that:
es = Elasticsearch()
def bulk_index():
actions = doc_generator()
res = bulk(es, actions, index='test', doc_type='test',
expand_action_callback=expand_action,
chunk_size=100000, timeout=30)
print 'res: ', res