Log full of "encountered retryable error" and logstash stopped listening

Jose_Ramierez · November 17, 2017, 8:20pm

Hmmm.

We were running logstash 5.6.1, and we noticed it stopped listening on IP.
So we upgraded it to logstash 5.6.4

And here I am , a few days later, and logstash stopped listening, again, on IP. Clients cannot connect to push log messages.

The logstash log gets this entry every few seconds:

[2017-11-17T13:57:44,949][ERROR][logstash.outputs.elasticsearch] Encountered a retryable error. Will Retry with exponential backoff {:code=>400, :url=>"http://<IP of ES endpoint>:9200/_bulk"}

The Logstash daemon is running, and the logstash monitoring endpoint is listening (I can run commands against port 9600 fine), but my profile endpoint is not accepting new connections (clients fail to connect to the socket it should be listening to).

I end up having to stop and start the logstash daemon to get it going (which is clearly not great!). Note, I only cycle logstash, so it does not appear the problem is in ES.

What does this mean?

What do I do next?

The output from a bunch of monitor api commands is here: https://pastebin.com/Aysn229G . These were done while logstash was in the problem state.

magnusbaeck · November 21, 2017, 6:22am

ES is rejecting documents, typically because they're malformed or are incompatible with the existing mapping. The Logstash log should contain additional information about this. Perhaps the dead letter queue feature could help you capture the rejected documents?

Jose_Ramierez · November 21, 2017, 3:43pm

Magnus,

Thank you.

I will look into the DLQ.

The reason I have not, is that the symptom is clearly that Logstash has died. Consider:

It is not that some messages are being forwarded to ES by Logstash. Rather, when this condition arises, zero messages are forwarded to ES. (Logstash is on the receiving end of 10-1000 messages per second coming in via TCP, from eight or ten app servers)
The entire situation is resolved by cycling the logstash daemon. ES is not cycled. Cycling Logstash instantly changes the system state from Logstash forwarding zero messages to Logstash forwarding all messages. This indicates that the Logstash process is in a damaged state.

I will now set up DLQ, but from the evidence, there is no reason to believe the DLQ will give useful info. (Also, other posters with the same issue report nothing in their DLQs)

Jose_Ramierez · November 27, 2017, 3:36am

Magnus,

We can say authoritatively that the problem is NOT ES related. It is pure logstash failure.

This thread reflects where we have taken this so far (and the other poster is not us, it is another user with the same problem we have, but only today did we get another failure, and more log data, including:

[io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.io.IOException: Too many open files

Jose_Ramierez · November 27, 2017, 9:32pm

The active investigation on this issue is being discussed here:

system · December 25, 2017, 9:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash failing with "Encountered a retryable error." Logstash	3	7525	January 10, 2018
Logstash encountered retryable error Logstash	1	211	August 18, 2020
Encountered a retryable error in Logstash 5.6.3 Logstash	2	2999	December 15, 2017
Unable to start Logstash: Encountered a retryable error. Will Retry with exponential backoff Logstash	1	804	May 31, 2018
Encountered a retryable error. Will Retry with exponential backoff {:code=>400 Logstash	1	806	August 30, 2018

Log full of "encountered retryable error" and logstash stopped listening

Related topics