Hi,
We are using filebeat as a deamonset on kubernetes nodes to collect all application logs and sending them to logstash and then to elasticsearch.
we observe that logs stops working/hangs periodically from some specific service which generates huge logs. Like either filebeat hangs on sending logs from some services to logstash or logstash hangs and stops receiving logs from some filebeat pods. It is hard to identify at which layer we are seeing issue as I dont see any suspicious errors in logstash even in debug mode. And I see below logs from filebeat.
To resolve this issue either we have to restart filebeat or logstash. I tried scaling logstash by doubling the number of logstash pods and also doubling memory and heap size but still seeing the same issue.
This is happening at-least once a day. Can someone please help me identify whats the issue
filebeat and logstash version - 7.9.3
2021-02-03T18:09:59.087Z ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: read tcp ip:50294->ip:5044: i/o timeout
2021-02-03T18:09:59.087Z ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: read tcp ip:50294->ip:5044: i/o timeout
2021-02-03T22:05:02.487Z ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(async(tcp://logstash-logstash:5044)): dial tcp ip:5044: i/o timeout
2021-02-03T22:05:03.713Z ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(async(tcp://logstash-logstash:5044)): dial tcp ip:5044: i/o timeout
20