Indexing throttling in Elasticsearch

itasyurt · September 24, 2017, 1:24pm

Hello all,

We have number of periodic indexing tasks that perform bulk indexing of a few(let's say 10) million documents daily partitioned into pages of size 5K ( we have 200 tasks each having a page of 5K documents).

When multiple pages(more than 80) are pushed to Elasticsearch for indexing, simulatenously, index queue of the Elasticsearch gets flooded and performance degrades for search and indexing clusterwide. Increasing the index queue_size provides some improvement but seems like a band-aid.

Unfortunately our task queue does not allow to throttle indexing tasks in application side.

So is there a way in Elasticsearch to throttle indexing tasks (without dropping the index requests of course)
latency is not a big problem in our case, indexing taking longer time is acceptable.

We have around 10 nodes in the cluster (Elasticsearch v 2) each acting as a master-eligible data node.

Any ideas or suggestions?
Best

jasontedor · September 24, 2017, 4:56pm

Queues are useful for handling variable load. When they fill up, the subsequent rejections are a form of backpressure that clients should use to throttle themselves. If for some reason throttling is not acceptable, it means the cluster is underprovisioned for the load and needs additional capacity.

Let’s think through what it would mean for Elasticsearch to throttle itself though without the clients backing off. That means Elasticsearch has to buffer all these requests. Eventually it will run out of capacity to do that and will have to start rejecting requests. All we have done is move the backpressure problem. You might say hold on, it won’t run out of capacity if I put a giant disk behind it and Elasticsearch spills the requests to disk. We have just reinvented a persistent queue and we already have a solution for that: Logstash.

So here’s where I am on this: if you’re overwhelming Elasticsearch you either need to make your cluster bigger or apply throttling client side.

itasyurt · September 25, 2017, 11:37am

Logstash seems the answer to my question. Thanks: +1

jasontedor · September 25, 2017, 11:51am

You’re welcome!

system · October 23, 2017, 11:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing throttling in Elasticsearch [2] Elasticsearch	6	2002	February 27, 2019
Bulk indexing slow down when data amount increase Elasticsearch	6	2951	July 6, 2017
Indexing is being throttled Elasticsearch	7	2633	July 6, 2017
Slow down/throttle indexing Elasticsearch	5	759	July 5, 2017
Throttling search requests Elasticsearch	5	3296	January 13, 2017

Indexing throttling in Elasticsearch

Related topics