we have Elastic Cluster installed on Kubernetes, I have the elastic agents managed by fleet installed on the wokernodes and data is getting pushed to elasticsearch. But I see fleet server getting restarted frequently.
kubectl.exe get po
NAME READY STATUS RESTARTS AGE
apmserver-dev-apm-server-65bd8dfd-cpw5x 1/1 Running 0 5d5h
elastic-agent-dev-agent-552p7 1/1 Running 3 (4d12h ago) 4d12h
elastic-agent-dev-agent-7qntc 1/1 Running 0 5d5h
elastic-agent-dev-agent-874kz 1/1 Running 0 5d4h
elastic-agent-dev-agent-b5pg6 1/1 Running 0 5d5h
elastic-agent-dev-agent-c4km9 1/1 Running 0 5d4h
elastic-agent-dev-agent-cxpqv 1/1 Running 0 5d5h
elastic-agent-dev-agent-dbmwg 1/1 Running 0 5d5h
elastic-agent-dev-agent-fpr2w 1/1 Running 0 5d5h
elastic-agent-dev-agent-hnbx2 1/1 Running 66 (2d18h ago) 4d23h
elastic-agent-dev-agent-jbqk8 1/1 Running 0 5d5h
elastic-agent-dev-agent-k6jlq 1/1 Running 41 (4d9h ago) 5d4h
elastic-agent-dev-agent-kgttg 1/1 Running 0 5d5h
elastic-agent-dev-agent-l4jj4 1/1 Running 0 5d5h
elastic-agent-dev-agent-lwtpt 1/1 Running 0 5d5h
elastic-agent-dev-agent-lx9ss 1/1 Running 0 5d5h
elastic-agent-dev-agent-mzhh7 1/1 Running 0 5d5h
elastic-agent-dev-agent-ngcrx 1/1 Running 0 4d12h
elastic-agent-dev-agent-nr5nc 1/1 Running 0 5d5h
elastic-agent-dev-agent-nw6s2 1/1 Running 0 5d5h
elastic-agent-dev-agent-s6b9p 1/1 Running 0 4d12h
elastic-agent-dev-agent-sxs9n 1/1 Running 0 5d6h
elastic-agent-dev-agent-z8lwx 1/1 Running 0 5d5h
elastic-agent-dev-agent-zrl7v 1/1 Running 0 5d5h
elastic-agent-dev-agent-zs9dl 1/1 Running 0 5d5h
elasticsearch-dev-es-data-0 1/1 Running 2 (34h ago) 5d4h
elasticsearch-dev-es-data-1 1/1 Running 2 (2d17h ago) 5d5h
elasticsearch-dev-es-master-0 1/1 Running 2 (46m ago) 5d4h
elasticsearch-dev-es-master-1 1/1 Running 0 5d5h
fleet-server-dev-agent-766f5b469-cvl7v 1/1 Running 357 (17m ago) 3d20h
kibana-dev-kb-77b5b4d44f-g8mhh 1/1 Running 0 5d4h
Fleet server is continuously restarting and I see below logs in the fleet pod.
{"log.level":"info","@timestamp":"2024-10-21T08:58:11.047Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":647},"message":"Component state changed filestream-monitoring (FAILED->STOPPED): Suppressing FAILED state due to restart for '3656' exited with code '-1'","log":{"source":"elastic-agent"},"component":{"id":"filestream-monitoring","state":"STOPPED","old_state":"FAILED"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-21T08:58:11.047Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":665},"message":"Unit state changed filestream-monitoring (FAILED->STOPPED): Suppressing FAILED state due to restart for '3656' exited with code '-1'","log":{"source":"elastic-agent"},"component":{"id":"filestream-monitoring","state":"STOPPED"},"unit":{"id":"filestream-monitoring","type":"output","state":"STOPPED","old_state":"FAILED"},"ecs.version":"1.6.0"}
I also see some TLS handshake errors, as below:
{"log.level":"error","@timestamp":"2024-10-21T08:58:14.147Z","message":"http: TLS handshake error from 10.3.0.120:1431: read tcp 10.3.0.66:8220->10.3.0.120:1431: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-21T08:58:14.148Z","message":"http: TLS handshake error from 10.3.1.22:28062: read tcp 10.3.0.66:8220->10.3.1.22:28062: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-21T08:58:14.447Z","message":"http: TLS handshake error from 10.3.0.62:13985: read tcp 10.3.0.66:8220->10.3.0.62:13985: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-21T08:58:14.448Z","message":"Running on policy with Fleet Server integration: eck-fleet-server","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"service.name":"fleet-server","service.type":"fleet-server","state":"HEALTHY","ecs.version":"1.6.0","ecs.version":"1.6.0"}
Can anyone suggest/help me fix this issue of Fleet?