Elasticsearch Output retry strategy

talvey · December 16, 2015, 8:02pm

I'm looking for suggestions on how best to prevent dropping Logstash messages in the Elasticsearch output plugin.

We run a fairly large ES (1.7) cluster fed exclusively by LS(1.5.4) -- on average we process 30k messages/s. When the ES cluster is under stress (assigning shards, relocating shards, or just a large burst of incoming data), I'll regularly see 429 (server busy) responses. In most cases the default LS retry strategy and settings functions well and the all messages are eventually processed. However, in more extreme cases, the LS logs will indicate that too many attempts have been made to send the event (max_retries) and it is dropped.

In this "dropping" scenario, I'd like the internal LS pipeline to be saturated, so processing of new messages effectively halts. So, my questions are:

When the retry queue reaches capacity (retry_max_items), does this in turn block the ability for the ES output plugin to receive new messages? If so, would setting something like the following be a good approach to ensure (as much as possible) all messages are processed?

retry_max_interval -> high interval (say 60 seconds). I don't want to stress the server with repeated retries
retry_max_items -> low max items (say 100). I want to throttle asap
max_retries -> high max retries ( say 100). I really want the data to go through

Is there a better approach? My intent is to effectively "back-off" LS message processing to give ES a break while it recovers.

I know there are a lot of factors involved here, so I'll happily elaborate.

Topic		Replies	Views
Logstash elasticsearch output plugin: configure retry count Logstash	1	1205	March 26, 2018
Logstash output failed retrying Logstash	1	394	July 30, 2018
Logstash elasticsearch output retry logic Logstash	1	982	July 6, 2017
Elasticsearch output indefinite retry on failure Logstash	1	496	March 18, 2020
How to monitor/identify retry result with output elasticsearch Logstash	1	234	January 12, 2021

Elasticsearch Output retry strategy

Related topics