Number of active threads for bulk thread_pool is equal to number of shards to which write is happening and not a single bulk request.
My understanding was that , active threads in thread_pool = number of concurrent bulk requests.
But seems this is equals to the number of shards to which write happens.
We have a bulk where we are writing to multiple indices in a single bulk request and it seems to exceed active threads + queue size and finally hitting rejection , even though we have only 1 bulk request in parallel.
Is my observation wrong or is Elasticsearch working like this ?
What is the use case? How many indices and shards do you have? What is your sharding strategy as you end up writing to so many shards in a single bulk request?
If you have a very large number of shards in your cluster, you may also benefit from this blog post.
Well , lets assume I have like 50 indices with 5 shards each. Inside my bulk request , each request can go to a different index , which means that number of active threads = concurrency Multiplied by ( number of shards to which write happens ) which could be 1 * 250 ( worst case , I am writing to all shards ).
Some of the index requests will fail for sure.
If all shards are on a single node that could end up exceeding the queue size. If you can reduce the number of shards per index this would improve. Do you need 5 shards per index?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.