Logstash retries 404 instead of dropping - output stuck

Hi!

Today I had a situation where one of our Elasticsearch outputs in logstash got unavailable. It replied with a 404. Based upon the documentation logstash should give a warning and drop the event. We have no DLQ configured.

400 and 404 errors are sent to the dead letter queue (DLQ), if enabled. If a DLQ is not enabled, a log message will be emitted, and the event will be dropped. See DLQ Policy for more info.

However since it got stuck retrying, it seemed like all other outputs were also not fowarding data anymore.

Output: [ERROR][logstash.outputs.elasticsearch][main] Encountered a retryable error (will retry with exponential backoff) {:code=>404, :url=>"https://our-url/_bulk", :content_length=>22, :body=>"{\"ok\":false,\"message\":\"Unknown resource.\"}\n"}

Logstash version: 7.17.6.1

Configuration:

    elasticsearch {
      hosts => ["https://our-url:443"]
      index => "%{[index][name]}"
      ilm_enabled => false
      manage_template => false
      action => "create"
      ecs_compatibility => "v8"
      user => "logstash_writer"
      password => "**********"
      failure_type_logging_whitelist => ["version_conflict_engine_exception"]
    }

Is this a bug? Or have i missed something?

You missed something :slight_smile: Before reading the documentation for the Retry Policy you need to understand what a _bulk request to elasticsearch looks like. It is an HTTP request that contains indexing requests for multiple documents. The overall request gets an HTTP status back. If that is not 200 then the documentation says it is retried indefinitely. The individual documents in the _bulk request also get HTTP statuses back. If the status for a document is 400 or 404 then it is routed to the DLQ.

The bulk request and DLQ logic is here.

1 Like

Hehe, thanks. That makes at lot of sense. Do you have any suggestion to not let such issues bring down a logstash instance completly? I guess my main issue is that there are different outputs in the same pipeline?

You can cope with a limited outage of one output by using pipeline to pipeline communication with an output isolator pattern.

If you want to drop data instead of using logstash's at-least-once delivery model then see this post.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.