We have number of periodic indexing tasks that perform bulk indexing of a few(let's say 10) million documents daily partitioned into pages of size 5K ( we have 200 tasks each having a page of 5K documents).
When multiple pages(more than 80) are pushed to Elasticsearch for indexing, simulatenously, index queue of the Elasticsearch gets flooded and performance degrades for search and indexing clusterwide. Increasing the index queue_size provides some improvement but seems like a band-aid.
Unfortunately our task queue does not allow to throttle indexing tasks in application side.
So is there a way in Elasticsearch to throttle indexing tasks (without dropping the index requests of course)
latency is not a big problem in our case, indexing taking longer time is acceptable.
We have around 10 nodes in the cluster (Elasticsearch v 2) each acting as a master-eligible data node.
Any ideas or suggestions?