Logstash gives below error intermittently and then stops processing data after a while .
Logstash version : 7.0.1
Elastic version : 7.0.1
Java : JDK8
Could it be due to malformed requests ? We do see some malformed data . DLQ is not of much help as there's nothing that is getting logged .
Is logstash going into infinite re-try loop due to 400 bad data , how can this is stopped from happening ?
[ERROR][logstash.outputs.elasticsearch] Encountered a retryable error. Will Retry with exponential backoff {:code=>400, :url=>"http://localhost:9200/_bulk", :body=>"{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"},"status":400}"}
A DLQ will not help because that is used for when the individual documents within a _bulk request get a 400 or 404 response. You are getting a 400 response to the bulk request itself.
Personally I think it makes no sense to retry the request, since the retry will get the same error. The output will back off until it is repeating the failed request once a minute, back-pressure will cause the pipeline to stop processing and the inputs will stop reading. It will retry forever and you cannot prevent that. That's the deliver-at-least-once contract that logstash provides.
The elasticsearch _bulk API expects a series of lines, with a line describing an action (e.g. "index") followed by a line containing a JSON object to be acted on.
The error message is telling you that it did not find any actions in the request from logstash. I believe this is the response you would get if you used curl to send a random piece of JSON into elasticsearch
{ "foo" : { "bar": 1 } }
elasticsearch would parse that, find that "foo" is not a request that the _bulk API supports (index, create, update etc.), and tell you that "no requests added".
Now the elasticsearch output does not send random pieces of JSON to elasticsearch, but there may be a case where it sends a list of zero actions. I checked the current code base and it catches the obvious boundary conditions, such as getting called when there are no events to process, or the event batch exactly filling the maximum request buffer, so that when it goes to process a second part of the batch that part is empty. However, that is the 7.10 codebase, and it is possible that some of those checks were added after 7.0.1.
If I were in your position I would configure the elasticsearch output to point to an HTTP debugging proxy that will show you the requests going through. Seeing the actual request may (or may not) suggest a way forward.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.