Filebeat log fails to publish events

zaeemmasood · January 27, 2023, 5:12pm

Hello,

Our ELK cluster has been stable for a long time, but recently we have started seeing the following error in the filebeat log:

ERROR   logstash/async.go:256   Failed to publish events caused by: write tcp xx.xx.xx.xx:12344->xxx.xx.xx.xx:5044: write: connection reset by peer

ERROR   pipeline/output.go:121  Failed to publish events: write tcp xxx.xx.xx.xx:44834->xxx.xxx.xx.xx.:5044: write: connection reset by peer

This results in a lag in messages getting displayed in Kibana.

There is no firewall between filebeat and logstash servers.

Please guide

zaeemmasood · January 27, 2023, 8:53pm

This also followed by the i/o error at times:

2023-01-27T15:40:24.055-0500    ERROR   pipeline/output.go:121  Failed to publish events: write tcp xxx.xx.xx.xxx:32886->xxx.xx.xx.xx:5044: write: connection reset by peer
2023-01-27T15:40:24.103-0500    ERROR   pipeline/output.go:121  Failed to publish events: write tcp xxx.xx.xx.xxx:36020->xxx.xx.xx.xxx:5044: write: connection reset by peer
2023-01-27T15:45:49.249-0500    ERROR   logstash/async.go:256   Failed to publish events caused by: read tcp xxx.xx.xx.xxx:46912->xxx.xx.xx.xxx:5044: i/o timeout
2023-01-27T15:45:59.114-0500    ERROR   logstash/async.go:256   Failed to publish events caused by: client is not connected
2023-01-27T15:46:00.735-0500    ERROR   pipeline/output.go:121  Failed to publish events: client is not connected

leandrojmp · January 28, 2023, 2:09pm

Did you check your Logstash logs?

Your logs suggest that there is some issue in your Logstash machine as the connections are being closed, you need to look at the logs of your logstash and also the system itself for hints of what may be the issue.

zaeemmasood · January 28, 2023, 2:44pm

Thanks for responding @leandrojmp

I enabled debug on logstash side but due to a very large number of messages pouring in from all sides it kept rolling very fast. Without enabling debug the log is silent.

Rios · January 28, 2023, 4:59pm

It's like you are using http on https connection, and LS reject connection.
What are your settings for output.logstash in filebeat.yml?

Here is another sugestion.

zaeemmasood · January 28, 2023, 5:28pm

Thanks for the suggestion. There is no firewall between filebeat and logstash so this might not be relevant.

This set up has been running for over 2 years now without issues and this happened all of a sudden.

Also logstash is extensively used as it is fed from different sources (client_inactivity_timeout also not relavent perhaps).

filebeat output is load balanced between 8 logstash hosts as follows:

output.logstash:

  hosts: ["xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044","xxx:xx.xx.xx:5044"]

  loadbalance: true
  index: filebeat

logging.level: Info
logging.to_files: true
logging.files:
  path: /opt/tal/rtal/elkagent/log/
  name: filebeat_tp.log

  keepfiles: 7
  permissions: 0644

Rios · January 28, 2023, 5:34pm

Check this, similar problem

zaeemmasood · January 28, 2023, 5:52pm

thanks. Looks like this parameter ttl is useful when the logstash hosts are behind a Load balancer, which is not the case.

Here hosts defined are individual nodes on port 5044

I can still try to introduce the ttl parameter (maybe with a 2 secs time).

leandrojmp · January 29, 2023, 1:33pm

Did you check the logs on all the 8 nodes? Does your connection error happens randomly or there is one or more logstash hosts that give you more errors?

Did you addedd any new kind of data, changed any logstash pipeline recently? Sometimes small changes on a logstash pipeline can lead to performance issues.

Your main issue is a network issue and sometimes it is not easy to troubleshoot.

Do you have any monitoring on your logstash instances and hosts? You may try to see how is the CPU/memory usage of your instances/hosts when this issue is logged.

zaeemmasood · February 1, 2023, 2:23pm

Thanks @leandrojmp. I also suspect a network issue here.

For some reason now kibana seems to be updating the messages with current timestamp though the io timeout and write: connection reset by peer still seem to appear.

It still remains a mystery and I will keep montoring......

system · March 1, 2023, 4:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash error : Failed to publish events caused by: write tcp YY.YY.YY.YY:40912->XX.XX.XX.XX:5044: write: connection reset by peer Logstash	1	413	July 18, 2020
Connection reset by peer Beats filebeat	2	371	September 17, 2021
Failed to publish events caused by: write tcp X.X.X.X:36524->X.X.X.X:5044: write: connection reset by peer error in filebeat Beats filebeat	3	1238	July 13, 2020
Filebeat 6.2 error: write: connection reset by peer Beats filebeat	2	1253	May 23, 2018
Filebeat write:connection reset by peer Beats filebeat	9	12378	February 23, 2018

Filebeat log fails to publish events

Related topics