Operator will not start

My operator which has worked just fine for a couple of years (with upgrades of course) is failing today. Logs are below. We had some kubernetes issues earlier but they are are all fixed now. This is the only issue we are having. Any help will be appreciated. This is in AKS for context:

Kubernetes events shows no issues.

3m24s       Warning   BackOff            pod/elastic-operator-0              Back-off restarting failed container
11m         Normal    LeaderElection     configmap/elastic-operator-leader   elastic-operator-0_47744e8a-01be-4e9e-a8b0-c9235373063e became leader

Operator Logs:

E1007 18:19:29.608995       1 leaderelection.go:330] error retrieving resource lock elastic-system/elastic-operator-leader: Get "https://172.18.0.1:443/apis/coordination.k8s.io/v1/namespaces/elastic-system/leases/elastic-operator-leader?timeout=1m0s": context deadline exceeded
I1007 18:19:29.609101       1 leaderelection.go:283] failed to renew lease elastic-system/elastic-operator-leader: timed out waiting for the condition
{"log.level":"error","@timestamp":"2022-10-07T18:19:29.609Z","log.logger":"manager","message":"Failed to start the controller manager","service.version":"2.4.0+96282ca9","service.type":"eck","ecs.version":"1.4.0","error":"leader election lost","error.stack_trace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"}
{"log.level":"error","@timestamp":"2022-10-07T18:19:29.609Z","log.logger":"manager","message":"Operator stopped with error","service.version":"2.4.0+96282ca9","service.type":"eck","ecs.version":"1.4.0","error":"leader election lost","error.stack_trace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"}
{"log.level":"error","@timestamp":"2022-10-07T18:19:29.609Z","log.logger":"manager","message":"Shutting down due to error","service.version":"2.4.0+96282ca9","service.type":"eck","ecs.version":"1.4.0","error":"leader election lost","error.stack_trace":"github.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:872\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:990\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:918\nmain.main\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/main.go:31\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Error: leader election lost

This was due to a bad connectivity in kubernetes causing weird issues. It appeared in the operator and in the cluster leader election. Once the connectivity issues were cleared up everything returned to normal.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.