Operator will not start

Jeff_Rankin · October 7, 2022, 6:27pm

My operator which has worked just fine for a couple of years (with upgrades of course) is failing today. Logs are below. We had some kubernetes issues earlier but they are are all fixed now. This is the only issue we are having. Any help will be appreciated. This is in AKS for context:

Kubernetes events shows no issues.

3m24s       Warning   BackOff            pod/elastic-operator-0              Back-off restarting failed container
11m         Normal    LeaderElection     configmap/elastic-operator-leader   elastic-operator-0_47744e8a-01be-4e9e-a8b0-c9235373063e became leader

Operator Logs:

E1007 18:19:29.608995       1 leaderelection.go:330] error retrieving resource lock elastic-system/elastic-operator-leader: Get "https://172.18.0.1:443/apis/coordination.k8s.io/v1/namespaces/elastic-system/leases/elastic-operator-leader?timeout=1m0s": context deadline exceeded
I1007 18:19:29.609101       1 leaderelection.go:283] failed to renew lease elastic-system/elastic-operator-leader: timed out waiting for the condition
{"log.level":"error","@timestamp":"2022-10-07T18:19:29.609Z","log.logger":"manager","message":"Failed to start the controller manager","service.version":"2.4.0+96282ca9","service.type":"eck","ecs.version":"1.4.0","error":"leader election lost","error.stack_trace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"}
{"log.level":"error","@timestamp":"2022-10-07T18:19:29.609Z","log.logger":"manager","message":"Operator stopped with error","service.version":"2.4.0+96282ca9","service.type":"eck","ecs.version":"1.4.0","error":"leader election lost","error.stack_trace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"}
{"log.level":"error","@timestamp":"2022-10-07T18:19:29.609Z","log.logger":"manager","message":"Shutting down due to error","service.version":"2.4.0+96282ca9","service.type":"eck","ecs.version":"1.4.0","error":"leader election lost","error.stack_trace":"github.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:872\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:990\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:918\nmain.main\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/main.go:31\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Error: leader election lost

Jeff_Rankin · October 12, 2022, 12:33pm

This was due to a bad connectivity in kubernetes causing weird issues. It appeared in the operator and in the cluster leader election. Once the connectivity issues were cleared up everything returned to normal.

system · November 9, 2022, 12:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ECK Operator - Leader Election Lost causes Restart Elastic Cloud on Kubernetes (ECK)	2	970	September 21, 2022
Eck-operator error logs Elastic Cloud on Kubernetes (ECK)	5	1128	September 20, 2022
Elastic-operator-uuid is getting read timedout Elastic Cloud on Kubernetes (ECK)	1	161	May 9, 2024
Failed to get API Group-Resources Elastic Cloud on Kubernetes (ECK)	5	2774	November 4, 2022
Deploy ECK Back-off pulling image Elastic Cloud on Kubernetes (ECK)	0	49	August 15, 2024

Operator will not start

Related topics