Number of active threads for bulk thread_pool is equal to number of shards to which write is happening and not a single bulk request

Number of active threads for bulk thread_pool is equal to number of shards to which write is happening and not a single bulk request.

My understanding was that , active threads in thread_pool = number of concurrent bulk requests.
But seems this is equals to the number of shards to which write happens.
We have a bulk where we are writing to multiple indices in a single bulk request and it seems to exceed active threads + queue size and finally hitting rejection , even though we have only 1 bulk request in parallel.

Is my observation wrong or is Elasticsearch working like this ?

Seems that is how Elasticsearch works and it is well documented - https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster

But what should be done in instance where I have a bulk which writes to multiple indices ? Number of shards can be of very huge number.

What is the use case? How many indices and shards do you have? What is your sharding strategy as you end up writing to so many shards in a single bulk request?

If you have a very large number of shards in your cluster, you may also benefit from this blog post.

Well , lets assume I have like 50 indices with 5 shards each. Inside my bulk request , each request can go to a different index , which means that number of active threads = concurrency Multiplied by ( number of shards to which write happens ) which could be 1 * 250 ( worst case , I am writing to all shards ).
Some of the index requests will fail for sure.

If all shards are on a single node that could end up exceeding the queue size. If you can reduce the number of shards per index this would improve. Do you need 5 shards per index?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.