Logstash/Filebeat is periodically hanging (unresponsive)

Hi,

We are using filebeat as a deamonset on kubernetes nodes to collect all application logs and sending them to logstash and then to elasticsearch.
we observe that logs stops working/hangs periodically from some specific service which generates huge logs. Like either filebeat hangs on sending logs from some services to logstash or logstash hangs and stops receiving logs from some filebeat pods. It is hard to identify at which layer we are seeing issue as I dont see any suspicious errors in logstash even in debug mode. And I see below logs from filebeat.
To resolve this issue either we have to restart filebeat or logstash. I tried scaling logstash by doubling the number of logstash pods and also doubling memory and heap size but still seeing the same issue.
This is happening at-least once a day. Can someone please help me identify whats the issue

filebeat and logstash version - 7.9.3

2021-02-03T18:09:59.087Z	ERROR	[logstash]	logstash/async.go:280	Failed to publish events caused by: read tcp ip:50294->ip:5044: i/o timeout
2021-02-03T18:09:59.087Z	ERROR	[logstash]	logstash/async.go:280	Failed to publish events caused by: read tcp ip:50294->ip:5044: i/o timeout
2021-02-03T22:05:02.487Z	ERROR	[publisher_pipeline_output]	pipeline/output.go:154	Failed to connect to backoff(async(tcp://logstash-logstash:5044)): dial tcp ip:5044: i/o timeout
2021-02-03T22:05:03.713Z	ERROR	[publisher_pipeline_output]	pipeline/output.go:154	Failed to connect to backoff(async(tcp://logstash-logstash:5044)): dial tcp ip:5044: i/o timeout
20

here are the debug logs I see on logstash, It seems pipeline is blocked or something but I don't see any issues regarding pipeline failure. But logstash hangs there and is not working until manually restarted

and we have 8 logstash pods running as a logstash service and out of 8 only 1 logstash pod is having this issue but not sure why some filebeat pods sticks to that logstash pod and keep waiting for it to ack instead of opening connection with another logstash pod.

Does filebeat open sticky sessions with logstash? why filebeat is waiting in infinite loop for logstash to ack?

to resolve this either I have to restart logstash pod or filebeat pod and this is happening almost everyday.

[2021-02-05T07:05:50,780][DEBUG][org.logstash.beats.ConnectionHandler][main] 884460aa: batches pending: true
[2021-02-05T07:05:53,405][DEBUG][org.logstash.beats.ConnectionHandler][main] 4f6b0df8: reader and writer are idle, closing remote connection
[2021-02-05T07:05:53,475][DEBUG][org.logstash.beats.ConnectionHandler][main] 17aaab9e: batches pending: true
[2021-02-05T07:05:54,110][DEBUG][org.logstash.beats.ConnectionHandler][main] 8cff884d: batches pending: true
[2021-02-05T07:06:15,369][DEBUG][org.logstash.beats.ConnectionHandler][main] babac4ef: batches pending: true
[2021-02-05T07:06:21,704][DEBUG][org.logstash.beats.ConnectionHandler][main] b8f4c8f7: batches pending: true
[2021-02-05T07:06:28,794][DEBUG][org.logstash.beats.ConnectionHandler][main] a37e2da3: batches pending: true
[2021-02-05T07:06:29,834][DEBUG][org.logstash.beats.ConnectionHandler][main] 455c7c87: batches pending: true
[2021-02-05T07:06:29,834][DEBUG][org.logstash.beats.ConnectionHandler][main] 455c7c87: batches pending: true
[2021-02-05T07:06:30,794][DEBUG][org.logstash.beats.ConnectionHandler][main] a895e7d3: batches pending: true
[2021-02-05T07:06:36,704][DEBUG][org.logstash.beats.ConnectionHandler][main] b8f4c8f7: batches pending: true
[2021-02-05T07:06:38,476][DEBUG][org.logstash.beats.ConnectionHandler][main] 17aaab9e: batches pending: true
[2021-02-05T07:06:42,113][DEBUG][org.logstash.beats.ConnectionHandler][main] 8cff884d: batches pending: true
[2021-02-05T07:06:50,779][DEBUG][org.logstash.beats.ConnectionHandler][main] b5569bb1: reader and writer are idle, closing remote connection
[2021-02-05T07:06:50,779][DEBUG][org.logstash.beats.ConnectionHandler][main] 6997090c: reader and writer are idle, closing remote connection

here is my filebeat logstash output config. I set loadbalancing true

    output.logstash:
        hosts:
        - logstash-logstash:5044
        loadbalance: true
        compression_level: 3
        worker: 8
        bulk_max_size: 4096
        pipelining: 4

can anyone please help?

@Badger any thoughts on this issue? logstash hangs without any error, debug logs only show below info

[2021-02-05T07:06:36,704][DEBUG][org.logstash.beats.ConnectionHandler][main] b8f4c8f7: batches pending: true
[2021-02-05T07:06:38,476][DEBUG][org.logstash.beats.ConnectionHandler][main] 17aaab9e: batches pending: true
[2021-02-05T07:06:42,113][DEBUG][org.logstash.beats.ConnectionHandler][main] 8cff884d: batches pending: true

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.