Currently we have an issue that occurs once a week, which seems to drop connections and certain log input aren't received anymore. First log input (/var/log/messages) are still received. The logs are still available and with daemon restart the processing continues as expected.
When analysing the logs we can see that the problems seemed to be started after a restart and ERROR messages are flooding into the beats logfile related to
pipeline/output.go failed to publish events:write tcp ip:port->ip:port:write:connection reset by peer
We also looked into the logstash logs and can see no directly related cause, but around the time window of the issue we see some connection reset by peer related to the beats input plugin.
We suspect maybe a delay or even disconnect in the connection due latency or busy resources at that certain time, but again is strange since /var/log/messages are still gathered. We can try to increase the logstash beats input client_inactivity_timeout, but currently see direct related need here. Also timeout tuning on the beats logstash output is a thought.
The customer is using Beats 6.4.2 (yes, 7 is on the way) with a rather basic config (two log inputs, last log with multiline), some tags, fields and an loadbalanced logstash output using SSL.
What can be the issue in this case ?