Filebeat log fails to publish events

Hello,

Our ELK cluster has been stable for a long time, but recently we have started seeing the following error in the filebeat log:

ERROR   logstash/async.go:256   Failed to publish events caused by: write tcp xx.xx.xx.xx:12344->xxx.xx.xx.xx:5044: write: connection reset by peer

ERROR   pipeline/output.go:121  Failed to publish events: write tcp xxx.xx.xx.xx:44834->xxx.xxx.xx.xx.:5044: write: connection reset by peer

This results in a lag in messages getting displayed in Kibana.

There is no firewall between filebeat and logstash servers.

Please guide

This also followed by the i/o error at times:

2023-01-27T15:40:24.055-0500    ERROR   pipeline/output.go:121  Failed to publish events: write tcp xxx.xx.xx.xxx:32886->xxx.xx.xx.xx:5044: write: connection reset by peer
2023-01-27T15:40:24.103-0500    ERROR   pipeline/output.go:121  Failed to publish events: write tcp xxx.xx.xx.xxx:36020->xxx.xx.xx.xxx:5044: write: connection reset by peer
2023-01-27T15:45:49.249-0500    ERROR   logstash/async.go:256   Failed to publish events caused by: read tcp xxx.xx.xx.xxx:46912->xxx.xx.xx.xxx:5044: i/o timeout
2023-01-27T15:45:59.114-0500    ERROR   logstash/async.go:256   Failed to publish events caused by: client is not connected
2023-01-27T15:46:00.735-0500    ERROR   pipeline/output.go:121  Failed to publish events: client is not connected

Did you check your Logstash logs?

Your logs suggest that there is some issue in your Logstash machine as the connections are being closed, you need to look at the logs of your logstash and also the system itself for hints of what may be the issue.

Thanks for responding @leandrojmp

I enabled debug on logstash side but due to a very large number of messages pouring in from all sides it kept rolling very fast. Without enabling debug the log is silent.

It's like you are using http on https connection, and LS reject connection.
What are your settings for output.logstash in filebeat.yml?

Here is another sugestion.

Thanks for the suggestion. There is no firewall between filebeat and logstash so this might not be relevant.

This set up has been running for over 2 years now without issues and this happened all of a sudden.

Also logstash is extensively used as it is fed from different sources (client_inactivity_timeout also not relavent perhaps).

filebeat output is load balanced between 8 logstash hosts as follows:

output.logstash:

  hosts: ["xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044"]

  loadbalance: true
  index: filebeat

logging.level: Info
logging.to_files: true
logging.files:
  path: /opt/tal/rtal/elkagent/log/
  name: filebeat_tp.log

  keepfiles: 7
  permissions: 0644

Check this, similar problem

thanks. Looks like this parameter ttl is useful when the logstash hosts are behind a Load balancer, which is not the case.

Here hosts defined are individual nodes on port 5044

I can still try to introduce the ttl parameter (maybe with a 2 secs time).

Did you check the logs on all the 8 nodes? Does your connection error happens randomly or there is one or more logstash hosts that give you more errors?

Did you addedd any new kind of data, changed any logstash pipeline recently? Sometimes small changes on a logstash pipeline can lead to performance issues.

Your main issue is a network issue and sometimes it is not easy to troubleshoot.

Do you have any monitoring on your logstash instances and hosts? You may try to see how is the CPU/memory usage of your instances/hosts when this issue is logged.

1 Like

Thanks @leandrojmp. I also suspect a network issue here.

For some reason now kibana seems to be updating the messages with current timestamp though the io timeout and write: connection reset by peer still seem to appear.

It still remains a mystery :slight_smile: and I will keep montoring......

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.