Hi,
We have a K8s cluster with more than 30 nodes.
on each nodes we have a filebeat agent to collect container logs and node logs.
filebeat agents send data to 3 logstash on the same cluster.
On some nodes and after some minutes or hours, some filebeat agents lost the connection with the logstash server and can't reconnect.
When this arrive i have tested the connectivity with the filebeat test with success.
here is the command:
$ filebeat -c /etc/filebeat.yml test
connection...
parse host... OK
dns lookup... OK
addresses: 10.233.125.225
dial up... OK
TLS...
security: server's certificate chain verification is enabled
handshake... OK
TLS version: TLSv1.3
dial up... OK
talk to server... OK
to solve the issue, we have to restart converned filebeat agent.
Here are some logs from filebeat and logstash:
filebeat {"log.level":"error","@timestamp":"2023-03-07T16:11:11.628Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/client_worker.go","file.line":176},"message":"failed to publish events: client is not connected","service.na
me":"filebeat","ecs.version":"1.6.0"}
filebeat {"log.level":"info","@timestamp":"2023-03-07T16:11:11.628Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/client_worker.go","file.line":139},"message":"Connecting to backoff(async(tcp://logstash-2.logstash-headless.
logstash.svc.cluster.local:5044))","service.name":"filebeat","ecs.version":"1.6.0"}
filebeat {"log.level":"info","@timestamp":"2023-03-07T16:11:11.675Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/client_worker.go","file.line":147},"message":"Connection to backoff(async(tcp://logstash-2.logstash-headless.
logstash.svc.cluster.local:5044)) established","service.name":"filebeat","ecs.version":"1.6.0"}
filebeat {"log.level":"error","@timestamp":"2023-03-07T16:11:11.686Z","log.logger":"logstash","log.origin":{"file.name":"logstash/async.go","file.line":280},"message":"Failed to publish events caused by: EOF","service.name":"filebeat","ecs.version":"1.6.0
"}
filebeat {"log.level":"error","@timestamp":"2023-03-07T16:11:11.686Z","log.logger":"logstash","log.origin":{"file.name":"logstash/async.go","file.line":280},"message":"Failed to publish events caused by: EOF","service.name":"filebeat","ecs.version":"1.6.0
"}
filebeat {"log.level":"error","@timestamp":"2023-03-07T16:11:11.686Z","log.logger":"logstash","log.origin":{"file.name":"logstash/async.go","file.line":280},"message":"Failed to publish events caused by: EOF","service.name":"filebeat","ecs.version":"1.6.0
"}
filebeat {"log.level":"error","@timestamp":"2023-03-07T16:11:11.688Z","log.logger":"logstash","log.origin":{"file.name":"logstash/async.go","file.line":280},"message":"Failed to publish events caused by: client is not connected","service.name":"filebeat",
"ecs.version":"1.6.0"}
logstash {"level":"INFO","loggerName":"org.logstash.beats.BeatsHandler","timeMillis":1678205518829,"thread":"defaultEventExecutorGroup-4-2","logEvent":{"message":"[local: 10.233.125.225:5044, remote: 10.233.64.0:38402] Handling exception: org.jruby.except
ions.TypeError: (TypeError) no implicit conversion of String into Hash (caused by: org.jruby.exceptions.NoMethodError: (NoMethodError) undefined method `accept' for nil:NilClass)"}}
logstash {"level":"WARN","loggerName":"io.netty.channel.DefaultChannelPipeline","timeMillis":1678205518829,"thread":"nioEventLoopGroup-2-1","logEvent":{"message":"An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually
means the last handler in the pipeline did not handle the exception."}}
filebeat docker image: docker.elastic.co/beats/filebeat:8.4.2
logstash version: docker.elastic.co/logstash/logstash:8.4.2