Is there a way for Elasticsearch to throttle indexing or search requests?
I would argue on "All we have done is move the backpressure problem."
We should let the client app to decide whether or not to employ ES throttling and handle Server Busy errors.
Applying throttling on client side is not always practical:
(a) Some apps cannot implement throttling on their side but know to retry on errors.
(b) Some app are distributed and cannot calc their overall summary of requests being sent to cluster in order to maintain proper rate limiting.
Making cluster bigger won't prevent from occasional peak requests.
It looks like it is useful to have that feature on the ECE side, and move the backpressure problem...to the entity which knows to handle it.
So, what might be a recommendation to keep cluster safe against peaks when multiple apps randomly indexing/searching cluster?
Yes, Elasticsearch will start to reject indexing and search requests if overwhelmed. But this is what the linked discussion says, so perhaps we are not talking the same language. Can you clarify what you mean by "throttling" if not "applying backpressure by rejecting requests when over capacity"?
From my experience with Elastic 6.0, once get overwhelmed, it
(1) had GC blew up
(2) started to report "failed shard"
By throttling I meant some explicit settings of the max number of Searches, Indexing /sec, rather than some internal calculation of "over capacity" and "backpressure". Looks like the cluster has a tendency to overestimate its capabilities and accept requests which it cannot handle. What I want
is to set 10 searches /s and never come to backpressure.
Estimating capacity accurately is tricky, and something that the team actively work on improving. One big improvement coming in 7.0 is the real-memory-usage circuit breaker, and this kind of feature is our preferred direction for further improvements in this area.
Elasticsearch is set up to avoid coordination as much as possible, and this means it's not really feasible to apply a system-wide rate limiter to searches as you describe. Moreover there's no way to know whether any given rate is the right level at any given time. That could already be enough to take the cluster over capacity if the searches are slow enough, or it could be woefully underutilising your resources, and it could even be both.
Yet I don't understand what "never come to backpressure" means in this context. If Elasticsearch were to implement a rate limiter as you describe then backpressure is exactly what would happen should the client exceed the configured rate.
If your target rate is as low as 10 searches per second then perhaps you could simply use an standalone rate limiter (e.g. Nginx)?
Estimating capacity accurately is tricky and the intention is to avoid coordination as much as possible, and this is exactly the point I'm trying to make. It's hard to solve all the problems and we may want to provide the client app the choice to decide what would be the right rate for its needs.
Basically by doing all the internal capacity calculation you exclude me (the client) from the equation. I may want to have very small cluster supporting 10/s and not crashing under 1000/s.
By "never come to backpressure" I meant that the incoming requests will be rejected even before the engine of capacity calculations.
Very good to see "usage circuits" , however what I mean is still to have all the internal capacity estimation, but in front of it, allow client app to set up either lower limit for incoming requests. Thus, to have 2 levels of control (configuration and real-time) for incoming rate.
A rate limiter is the wrong tool for this job in most situations. One reason for this is that it pays no attention to how expensive each query is - your cluster may be able to support 1000 "light" queries per second, but fall over trying to perform just 5 "heavy" ones.
I do not understand this goal. Rejecting requests due to overcapacity (e.g. with a circuit breaker) is cheap and happens very early in the process, by design.