We are using filebeat as a deamonset on kubernetes nodes to collect all application logs and sending them to logstash and then to elasticsearch.
we observe that logs stops working/hangs periodically from some specific service which generates huge logs. Like either filebeat hangs on sending logs from some services to logstash or logstash hangs and stops receiving logs from some filebeat pods. It is hard to identify at which layer we are seeing issue as I dont see any suspicious errors in logstash even in debug mode. And I see below logs from filebeat.
To resolve this issue either we have to restart filebeat or logstash. I tried scaling logstash by doubling the number of logstash pods and also doubling memory and heap size but still seeing the same issue.
This is happening at-least once a day. Can someone please help me identify whats the issue
filebeat and logstash version - 7.9.3
2021-02-03T18:09:59.087Z ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: read tcp ip:50294->ip:5044: i/o timeout
2021-02-03T18:09:59.087Z ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: read tcp ip:50294->ip:5044: i/o timeout
2021-02-03T22:05:02.487Z ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(async(tcp://logstash-logstash:5044)): dial tcp ip:5044: i/o timeout
2021-02-03T22:05:03.713Z ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(async(tcp://logstash-logstash:5044)): dial tcp ip:5044: i/o timeout
20
here are the debug logs I see on logstash, It seems pipeline is blocked or something but I don't see any issues regarding pipeline failure. But logstash hangs there and is not working until manually restarted
and we have 8 logstash pods running as a logstash service and out of 8 only 1 logstash pod is having this issue but not sure why some filebeat pods sticks to that logstash pod and keep waiting for it to ack instead of opening connection with another logstash pod.
Does filebeat open sticky sessions with logstash? why filebeat is waiting in infinite loop for logstash to ack?
to resolve this either I have to restart logstash pod or filebeat pod and this is happening almost everyday.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.