I get the general concept as to why this exception happens and handle catching the exception on the client so I can resubmit the rejected documents.. but ...
In a multi-node cluster, what specifically does this exception imply?
Let's assume there's a hypothetical cluster with nodes A, B, and C with a single index with 6 shards (2 shards per node).
When a bulk request to node A results in an EsRejectedExecutionException...
- Does this imply node A is simply unable to distribute the documents from the bulk requests fast enough to the appropriate nodes for indexing?
if so, I would think a solution would be smaller batch sizes such that the previous batch sent to node A was partially sent to node A and another node (B or C)
Also, I suspect this would also imply that the documents rejected in the bulk execution against node A could immediately be reissued via another bulk request to nodes B or C
or
- Does this imply node A itself may have an indexing backlog that's impacting node A's ability to service bulk requests?
If so, perhaps either the resources on the node are insufficient or the indices being written to have insufficient shards (and we're nor paralleling work sufficiently)?
I have good reason based on various monitoring points that my cluster has sufficient resources (I'm not pushing any particular memory, CPU, or disk IOPS limits) so I suspect the issue must lie in my index configuration (shard #) or batch size (perhaps simply too large for a single node to process at once?).
My clusters are currently still back on ES 1.7.1 although I suspect that may not be relevant to this topic.
Any input on the above would be greatly appreciated.