ECK Fail to update status field of elasticsearch cluster on EKS 1.19

Hi, i am trying to get a cluster running on EKS 1.19 to sucessfully create an elasticsearchcluster using the ECK operator 1.6

the cluster gets created just fine when i create the elasticsearch resource and i can see the status is green when i just curl the service port 9200

The operator fails to update the status of the cluster though so it just looks like

NAME     HEALTH   NODES   VERSION   PHASE   AGE
jaeger                                      11m

Logs from the operator show a not found error

elastic-operator-0 manager {"log.level":"error","@timestamp":"2021-07-21T09:00:05.214Z","log.logger":"manager.eck-operator.controller.elasticsearch-controller","message":"Reconciler error","service.version":"1.6.0+8326ca8a","service.type":"eck","ecs.version":"1.4.0","name":"jaeger","namespace":"jaeger-server","error":"elasticsearches.elasticsearch.k8s.elastic.co \"jaeger\" not found","errorCauses":[{"error":"elasticsearches.elasticsearch.k8s.elastic.co \"jaeger\" not found"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99"}

and digging into the CloudWatch apiserver logs i can find indeed the request being authorized but returning a 404 not found

@ingestionTime | 1626858005606
-- | --
@log | 542491017901:/aws/eks/stag101-us-east-1/cluster
@logStream | kube-apiserver-audit-2b59cc97274ecf922e440f703cd8faaa
@message | {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"d1e5d737-faa6-479c-88e2-0aa51b2a3afb","stage":"ResponseComplete","requestURI":"/apis/elasticsearch.k8s.elastic.co/v1/namespaces/jaeger-server/elasticsearches/jaeger/status?timeout=1m0s","verb":"update","user":{"username":"system:serviceaccount:elastic-system:elastic-operator","uid":"c00d05ae-68eb-4194-af50-d9445315b45f","groups":["system:serviceaccounts","system:serviceaccounts:elastic-system","system:authenticated"]},"sourceIPs":["10.146.79.0"],"userAgent":"elastic-operator/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":{"resource":"elasticsearches","namespace":"jaeger-server","name":"jaeger","apiGroup":"elasticsearch.k8s.elastic.co","apiVersion":"v1","subresource":"status"},"responseStatus":{"metadata":{},"status":"Failure","reason":"NotFound","code":404},"requestReceivedTimestamp":"2021-07-21T09:00:05.213978Z","stageTimestamp":"2021-07-21T09:00:05.214150Z","annotations":{"authentication.k8s.io/legacy-token":"system:serviceaccount:elastic-system:elastic-operator","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"elastic-operator\" of ClusterRole \"elastic-operator\" to ServiceAccount \"elastic-operator/elastic-system\""}}
@timestamp | 1626858005339
annotations.authentication.k8s.io/legacy-token | system:serviceaccount:elastic-system:elastic-operator
annotations.authorization.k8s.io/decision | allow
annotations.authorization.k8s.io/reason | RBAC: allowed by ClusterRoleBinding "elastic-operator" of ClusterRole "elastic-operator" to ServiceAccount "elastic-operator/elastic-system"
apiVersion | audit.k8s.io/v1
auditID | d1e5d737-faa6-479c-88e2-0aa51b2a3afb
kind | Event
level | Metadata
objectRef.apiGroup | elasticsearch.k8s.elastic.co
objectRef.apiVersion | v1
objectRef.name | jaeger
objectRef.namespace | jaeger-server
objectRef.resource | elasticsearches
objectRef.subresource | status
requestReceivedTimestamp | 2021-07-21T09:00:05.213978Z
requestURI | /apis/elasticsearch.k8s.elastic.co/v1/namespaces/jaeger-server/elasticsearches/jaeger/status?timeout=1m0s
responseStatus.code | 404
responseStatus.reason | NotFound
responseStatus.status | Failure
sourceIPs.0 | 10.146.79.0
stage | ResponseComplete
stageTimestamp | 2021-07-21T09:00:05.214150Z
user.groups.0 | system:serviceaccounts
user.groups.1 | system:serviceaccounts:elastic-system
user.groups.2 | system:authenticated
user.uid | c00d05ae-68eb-4194-af50-d9445315b45f
user.username | system:serviceaccount:elastic-system:elastic-operator
userAgent | elastic-operator/v0.0.0 (linux/amd64) kubernetes/$Format
verb | update

while i can see successful update requests to the non status URI /apis/elasticsearch.k8s.elastic.co/v1/namespaces/jaeger-server/elasticsearches/jaeger?timeout=1m0s

a bit more testing

kind 1.19

with the operator deployed and the quickstart ES it works just fine , using kubectl proxy i can get

  • curl -k -s -X GET -H "Accept: application/json, */*" -H "Content-Type: application/json" "127.0.0.1:8001/apis/elasticsearch.k8s.elastic.co/v1/namespaces/default/elasticsearches/quickstart/status?timeout=1m0s" -> 200 Works

on EKS 1.19

  • curl -k -s -X GET -H "Accept: application/json, */*" -H "Content-Type: application/json" "127.0.0.1:8001/apis/elasticsearch.k8s.elastic.co/v1beta1/namespaces/jaeger-server/elasticsearches/jaeger?timeout=1m0s" -> 200 Worls
  • curl -k -s -X GET -H "Accept: application/json, */*" -H "Content-Type: application/json" "127.0.0.1:8001/apis/elasticsearch.k8s.elastic.co/v1beta1/namespaces/jaeger-server/elasticsearches/jaeger/status?timeout=1m0s" -> 404 not found
  • curl -k -s -X GET -H "Accept: application/json, */*" -H "Content-Type: application/json" "127.0.0.1:8001/api/v1/namespaces/jaeger-server/pods/jaeger-es-master-1/status" -> 200 Works

Hi,

How the operator has been installed ? Is it an upgrade from a previous release ?

Could you provide the result of the following command: kubectl get crds elasticsearches.elasticsearch.k8s.elastic.co -o yaml | grep -n -A 2 -B 2 'subresources:'

Thanks

Hi, the operator was installed by rendering the helm chart to manifests and than applied via ArgoCD. It was a first install but i plan to upgrade it by re-rendering the same way and letting Argo apply it ( is that ok ? )

    helm template elastic elastic/eck-operator \
        --version 1.6.0 --kube-version v1.19.0 --dry-run --include-crds \
        --namespace elastic-system \
        --set=installCRDs=true \
        --set=webhook.enabled=true \
        --set=config.logVerbosity="0" \
        --set=config.metricsPort="0" \
        --set=config.caValidity="87600h" \
        --set=config.caRotateBefore="240h" \
        --set=config.certificatesRotateBefore="240h" \
        --set=config.kubeClientTimeout="60s" \
        --set=config.elasticsearchClientTimeout="180s" \
        --set=podMonitor.enabled=false \
        --set=global.createOperatorNamespace=false \
        --set=global.kubeVersion=1.19.0 \

kubectl get crds elasticsearches.elasticsearch.k8s.elastic.co -o yaml | grep -n -A 2 -B 2 'subresources:' gives me no output

checking the upstream all-in-one i can see i should be getting

$ cat CustomResourceDefinition-elasticsearches.elasticsearch.k8s.elastic.co.yaml | grep -n -A 2 -B 2 'subresources:'
42-    singular: elasticsearch
43-  scope: Namespaced
44:  subresources:
45-    status: {}
46-  validation:

but when i check the manifests i rendered from helm i can see is not there ( so not jus tlive but also in my checked in manifests)

I am gonna look into this to figure out what happened.

thanks!