Fleet server restarting frequently

sruthi.sattiraju · October 21, 2024, 9:05am

we have Elastic Cluster kubectl.exe get po
NAME apmserver-dev-apm-server-65bd8dfd-cpw5x elastic-agent-dev-agent-552p7 elastic-agent-dev-agent-7qntc elastic-agent-dev-agent-874kz elastic-agent-dev-agent-b5pg6 elastic-agent-dev-agent-c4km9 elastic-agent-dev-agent-cxpqv elastic-agent-dev-agent-dbmwg elastic-agent-dev-agent-fpr2w elastic-agent-dev-agent-hnbx2 elastic-agent-dev-agent-jbqk8 elastic-agent-dev-agent-k6jlq elastic-agent-dev-agent-kgttg elastic-agent-dev-agent-l4jj4 elastic-agent-dev-agent-lwtpt elastic-agent-dev-agent-lx9ss elastic-agent-dev-agent-mzhh7 elastic-agent-dev-agent-ngcrx elastic-agent-dev-agent-nr5nc elastic-agent-dev-agent-nw6s2 elastic-agent-dev-agent-s6b9p elastic-agent-dev-agent-sxs9n elastic-agent-dev-agent-z8lwx elastic-agent-dev-agent-zrl7v elastic-agent-dev-agent-zs9dl elasticsearch-dev-es-data-0 elasticsearch-dev-es-data-1 elasticsearch-dev-es-master-0 elasticsearch-dev-es-master-1 fleet-server-dev-agent-766f5b469-cvl7v kibana-dev-kb-77b5b4d44f-g8mhh installed on Kubernetes, I have the elastic agents managed by fleet installed on the wokernodes and data is getting pushed to elasticsearch. But I see fleet server getting restarted frequently.
READY STATUS RESTARTS AGE
1/1 Running 0 5d5h
1/1 Running 3 (4d12h ago) 4d12h
1/1 Running 0 5d5h
1/1 Running 0 5d4h
1/1 Running 0 5d5h
1/1 Running 0 5d4h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 66 (2d18h ago) 4d23h
1/1 Running 0 5d5h
1/1 Running 41 (4d9h ago) 5d4h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 0 4d12h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 0 4d12h
1/1 Running 0 5d6h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 0 5d5h
1/1 Running 2 (34h ago) 5d4h
1/1 Running 2 (2d17h ago) 5d5h
1/1 Running 2 (46m ago) 5d4h
1/1 Running 0 5d5h
1/1 Running 357 (17m ago) 3d20h
1/1 Running 0 5d4h

Fleet server is continuously restarting and I see below logs in the fleet pod.

{"log.level":"info","@timestamp":"2024-10-21T08:58:11.047Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":647},"message":"Component state changed filestream-monitoring (FAILED->STOPPED): Suppressing FAILED state due to restart for '3656' exited with code '-1'","log":{"source":"elastic-agent"},"component":{"id":"filestream-monitoring","state":"STOPPED","old_state":"FAILED"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-21T08:58:11.047Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":665},"message":"Unit state changed filestream-monitoring (FAILED->STOPPED): Suppressing FAILED state due to restart for '3656' exited with code '-1'","log":{"source":"elastic-agent"},"component":{"id":"filestream-monitoring","state":"STOPPED"},"unit":{"id":"filestream-monitoring","type":"output","state":"STOPPED","old_state":"FAILED"},"ecs.version":"1.6.0"}

I also see some TLS handshake errors, as below:
{"log.level":"error","@timestamp":"2024-10-21T08:58:14.147Z","message":"http: TLS handshake error from 10.3.0.120:1431: read tcp 10.3.0.66:8220->10.3.0.120:1431: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-21T08:58:14.148Z","message":"http: TLS handshake error from 10.3.1.22:28062: read tcp 10.3.0.66:8220->10.3.1.22:28062: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-21T08:58:14.447Z","message":"http: TLS handshake error from 10.3.0.62:13985: read tcp 10.3.0.66:8220->10.3.0.62:13985: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-21T08:58:14.448Z","message":"Running on policy with Fleet Server integration: eck-fleet-server","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"service.name":"fleet-server","service.type":"fleet-server","state":"HEALTHY","ecs.version":"1.6.0","ecs.version":"1.6.0"}

Can anyone suggest/help me fix this issue of Fleet?

strawgate · December 2, 2024, 3:36pm

Can you share the manifest you used to deploy fleet as well as the full log from the pod running fleet?

You can not upload full logs to the forum but you could upload to a GitHub gist or something similar.

Topic		Replies	Views
Fleet server lose agents at restart Elastic Agent fleet	6	2351	March 15, 2023
Fleet server does not start properly Kibana fleet	6	1161	September 27, 2021
[Solved] Elastic agent as fleet server auto shutdown for no reason in K8S Beats docker , fleet	3	2500	August 1, 2021
Fleet Server keeps crashing Elastic Security fleet	7	2223	July 6, 2021
Elastic agent fail to connect fleet server Elasticsearch fleet	1	728	February 3, 2022

Fleet server restarting frequently

Related topics