Elastic Agent not shipping all logs from Kubernetes Cluster. Errors in logs

Hello

I have Elastic Agent installed on 5 EKS clusters for logging and monitoring.
Recently the agents have stopped shipping all logs to the cluster (Hosted with elastic.cloud).
Im seeing some logs, but not all (usually one particular pods logs will stop).
Here is a snippet of some of the logs that I have seen from the Elastic Agent:

[elastic_agent.filebeat][error] Trace[1078279371]: [161.501202ms] [161.501202ms] END
15:28:06.636
elastic_agent.filebeat
[elastic_agent.filebeat][error] I0215 13:28:06.635947      29 trace.go:205] Trace[495160128]: "DeltaFIFO Pop Process" ID:status-service,Depth:19,Reason:slow event handlers blocking the queue (15-Feb-2023 13:28:06.381) (total time: 254ms):
15:28:06.636
elastic_agent.filebeat
[elastic_agent.filebeat][error] Trace[495160128]: [254.080068ms] [254.080068ms] END
15:28:06.838
elastic_agent.filebeat
[elastic_agent.filebeat][error] I0215 13:28:06.838193      29 trace.go:205] Trace[1332894256]: "DeltaFIFO Pop Process" ID:audit-service,Depth:19,Reason:slow event handlers blocking the queue (15-Feb-2023 13:28:06.554) (total time: 283ms):
15:28:06.838
elastic_agent.filebeat
[elastic_agent.filebeat][error] Trace[1332894256]: [283.793359ms] [283.793359ms] END
15:28:06.838
elastic_agent.filebeat
[elastic_agent.filebeat][error] I0215 13:28:06.838193      29 trace.go:205] Trace[1332894256]: "DeltaFIFO Pop Process" ID:audit-service,Depth:19,Reason:slow event handlers blocking the queue (15-Feb-2023 13:28:06.554) (total time: 283ms):
15:28:06.838
elastic_agent.filebeat
[elastic_agent.filebeat][error] Trace[1332894256]: [283.793359ms] [283.793359ms] END
Showing entries until Feb 15, 15:28:06
Harvester crashed with: harvester panic with: close of closed channel goroutine 3650 [running]: runtime/debug.Stack() runtime/debug/stack.go:24 +0x65 github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1.1() github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile/harvester.go:167 +0x78 panic({0x5555f95381c0, 0x5555f9affae0}) runtime/panic.go:844 +0x258 github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*cache).stop(...) github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/cache.go:97 github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).Close(0xc0058a5b80?) github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/kubernetes.go:311 +0x4f github.com/elastic/beats/v7/libbeat/processors.Close(...) github.com/elastic/beats/v7/libbeat/processors/processor.go:58 github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x5?) github.com/elastic/beats/v7/libbeat/publisher/processing/processors.go:95 +0x159 github.com/elastic/beats/v7/libbeat/processors.Close(...) github.com/elastic/beats/v7/libbeat/processors/processor.go:58 github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x0?) github.com/elastic/beats/v7/libbeat/publisher/processing/processors.go:95 +0x159 github.com/elastic/beats/v7/libbeat/processors.Close(...) github.com/elastic/beats/v7/libbeat/processors/processor.go:58 github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close.func1() github.com/elastic/beats/v7/libbeat/publisher/pipeline/client.go:167 +0x2df sync.(*Once).doSlow(0x0?, 0x0?) sync/once.go:68 +0xc2 sync.(*Once).Do(...) sync/once.go:59 github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close(0x5555f753dc46?) github.com/elastic/beats/v7/libbeat/publisher/pipeline/client.go:148 +0x59 github.com/elastic/beats/v7/filebeat/beater.(*countingClient).Close(0x5555f753dbbf?) github.com/elastic/beats/v7/filebeat/beater/channels.go:145 +0x22 github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1({0x5555f9b44f98?, 0xc0055e03c0}) github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile/harvester.go:219 +0x929 github.com/elastic/go-concert/unison.(*TaskGroup).Go.func1() github.com/elastic/go-concert@v0.2.0/unison/taskgroup.go:163 +0xc3 created by github.com/elastic/go-concert/unison.(*TaskGroup).Go github.com/elastic/go-concert@v0.2.0/unison/taskgroup.go:159 +0xca

Restarting the Elastic Agent seems to solve the problem temporarily.
These are dev environments, so see relatively low traffic.
The Elastic Agents are not crashing/restarting and show as healthy in the Fleet console.
There are also no resources on the Agent pods.
Any Ideas?

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.