Elasticsearch cluster overloaded

Hi,

Our elasticsearch cluster get overloaded from time to time.
We see thread_pool rejections on elasticsearch and logstash has error messages like the below

[logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7@57757508 on EsThreadPoolExecutor[bulk, queue capacity = 500, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@127e2b61[Running, pool size = 32, active threads = 32, queued tasks = 507, completed tasks = 575009786]]"}) 

(Increasing the thread_pool.bulk.queue_size to 500 had helped a bit)

Our cluster has 26TB of data, 8 hot, 5 warm and 10 cold nodes. We have ~2000 indices across 6800 primary shards, replicated once. We are running elasticsearch 5.5. Each elasticsearch instance has 30GB of memory.
The hot nodes have a max of 80 shards each.

Looking at our servers the cpu load and IO are very low. CPU is around 15%, IO 30% with peaks of 50%. File/ulimits are also fine.

We are wondering why elasticsearch isn't using more of the resources if it's under load / overloaded. And if there are settings in elasticsearch to improve the performance and get rid of those errors.

It seems the most load is from indexing so we increased indices.memory.index_buffer_size to 30%

Any tips would be appreciated.
Cheers,
Felix

Lots of indices and shards, indeed... This might be worth reading: https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.