Connection drops for certain log inputs on secure beats to logstash beats input

Hi there,

Currently we have an issue that occurs once a week, which seems to drop connections and certain log input aren't received anymore. First log input (/var/log/messages) are still received. The logs are still available and with daemon restart the processing continues as expected.

When analysing the logs we can see that the problems seemed to be started after a restart and ERROR messages are flooding into the beats logfile related to

pipeline/output.go failed to publish events:write tcp ip:port->ip:port:write:connection reset by peer

We also looked into the logstash logs and can see no directly related cause, but around the time window of the issue we see some connection reset by peer related to the beats input plugin.

We suspect maybe a delay or even disconnect in the connection due latency or busy resources at that certain time, but again is strange since /var/log/messages are still gathered. We can try to increase the logstash beats input client_inactivity_timeout, but currently see direct related need here. Also timeout tuning on the beats logstash output is a thought.

The customer is using Beats 6.4.2 (yes, 7 is on the way) with a rather basic config (two log inputs, last log with multiline), some tags, fields and an loadbalanced logstash output using SSL.

What can be the issue in this case ?

Hey Arnold!

I think that the "connection reset by peer" could be related to these beats conversations, and setting some parameters on your clients, though it likely won't get rid of those messages(it didnt for me):

However, do note that the following is still an open bug on the logstash beats input, that I am experiencing as well, with many incoming TLS-wrapped beats connections:

I think if you can keep the load relatively low on the loadbalanced logstash instances you may avoid running into it, but if you expect the load to increase(as I suspect you are by putting a loadbalancer in front of them) you may find yourself in a similar state.

Hi @Will_Weber thanks for your reply !!! I definitely will look at these Github records. Currently in the middle of an 6 to 7 migration, but will let you known the outcome.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.