My team is seeing a rather strange problem where the thread pool bulk queue increases significantly above the configured bulk queue size. Everything I can find indicates that when a thread pool queue is full it will reject requests. So, I would expect that the queue never exceeds the defined queue size.
Even with default queue sizes we have seen the bulk queue increase to 2,000, 3,000, even up to 4,000 during heavy indexing. There are significant numbers of bulk rejections when this happens, but I would not expect the bulk queue to ever increase above the defined queue size. Also, this is only happening with the bulk queue, the search queue always respects the search queue size. Has anybody seen this before or know what is causing it?
We are using the node stats API to view the bulk queue. Our cluster is running version 1.7.3 of Elasticsearch.
This is occurring by design. The reason for this is that once a bulk operation gets inside a primary shard, Elasticsearch must also get that operation inside the replica shards. Therefore, a bulk operation that was successful on a primary shard will be inserted into the bulk queue on replica shards independently of whether or not the bulk queue is full. The only time that a bulk request will be rejected because the bulk queue is full is when the bulk operation is received on the primary shard.