Reconciler error: auth secret key default-quickstart-kibana-user doesn't exist

I am currently upgrading my sandbox ECK installation from 7.14.0 to 8.2.2, via 7.17, as required.

I managed to finish this for Elasticsearch, but during the 7.14 -> 7.17 step for Kibana, I stumbled across an issue.

The operator seems to try to do the update, but bails out with the following message:

{
   "log.level":"error",
   "@timestamp":"2022-06-14T10:28:35.377Z",
   "log.logger":"manager.eck-operator.controller.kibana-controller",
   "message":"Reconciler error",
   "service.version":"2.2.0+02f250eb",
   "service.type":"eck",
   "ecs.version":"1.4.0",
   "name":"quickstart",
   "namespace":"default",
   "error":"auth secret key default-quickstart-kibana-user doesn't exist",
   "errorCauses":[
      {
         "error":"auth secret key default-quickstart-kibana-user doesn't exist",
         "errorVerbose":"auth secret key default-quickstart-kibana-user doesn't exist\ngithub.com/elastic/cloud-on-k8s/pkg/controller/association.ElasticsearchAuthSettings\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/association/conf.go:96\ngithub.com/elastic/cloud-on-k8s/pkg/controller/kibana.NewConfigSettings\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/kibana/config_settings.go:139\ngithub.com/elastic/cloud-on-k8s/pkg/controller/kibana.(*driver).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/kibana/driver.go:149\ngithub.com/elastic/cloud-on-k8s/pkg/controller/kibana.(*ReconcileKibana).doReconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/kibana/controller.go:196\ngithub.com/elastic/cloud-on-k8s/pkg/controller/kibana.(*ReconcileKibana).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/kibana/controller.go:161\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"
      }
   ],
   "error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"
}

I verified that the secret does exist in the cluster:

# kubectl get secret default-quickstart-kibana-user
NAME                             TYPE     DATA   AGE
default-quickstart-kibana-user   Opaque   2      257d

This error message leaves room for improvement. Sorry about that. What is actually meant is that a key inside the secret quickstart-kibana-user is missing.

Maybe you can check what key the controller is looking for by running this:

kubectl get kb quickstart  -o go-template='{{index .metadata.annotations "association.k8s.elastic.co/es-conf"}}'

and comparing it with the contents of default-quickstart-kibana-user will maybe help us shed a bit of light on what is exactly missing.

I'm not completely sure what to look for, but this is what I found:

# kubectl get kb quickstart  -o go-template='{{index .metadata.annotations "association.k8s.elastic.co/es-conf"}}' | jq
{
  "authSecretName": "quickstart-kibana-user",
  "authSecretKey": "default-quickstart-kibana-user",
  "caCertProvided": true,
  "caSecretName": "quickstart-kb-es-ca",
  "url": "https://quickstart-es-http.default.svc:9200",
  "version": "8.2.2"
}
# kubectl get secret default-quickstart-kibana-user -o jsonpath={.data.hash} | base64 -d
{PBKDF2_STRETCH}10000$Fq6....

Based on the output from the annotation there should be a secret called quickstart-kibana-user that should contain a key called default-quick start-kibana-user if that is not there then the association between Elasticsearch and Kibana is not established correctly. What does the status sub resource on the Kibana say? Are there any Kubernetes events of interest maybe? Anything else in the logs?

Phew, ECK is my Kubernetes learning project :wink: I'll try to unroll your questions:
There is a secret called quickstart-kibana-user, but it does not seem to contain the requested key:

{
  "hash": "e1BCS0RGMl...",
  "name": "ZGVmYX...",
  "serviceAccount": "ZWxhc3RpYy9raWJhbmE=",
  "token": "QUFFQ..."
}

Status sub resource? Do you mean this?

# kubectl get kibanas.kibana.k8s.elastic.co
NAME         HEALTH   NODES   VERSION   AGE
quickstart   red              7.15.0    257d

Status is red because of the incompatibility between Kibana 7.15.0 and Elasticsearch 8.2.2:

# kubectl logs --tail=2 quickstart-kb-56ffc5f89f-vvxb8
{"type":"log","@timestamp":"2022-06-14T08:05:58+00:00","tags":["error","savedobjects-service"],"pid":1217,"message":"This version of Kibana (v7.15.0) is incompatible with the following Elasticsearch nodes in your cluster: v8.2.2 @ quickstart-es-default-2.quickstart-es-default.default.svc/10.244.4.8:9200 (10.244.4.8), v8.2.2 @ quickstart-es-default-1.quickstart-es-default.default.svc/10.244.8.3:9200 (10.244.8.3), v8.2.2 @ quickstart-es-default-0.quickstart-es-default.default.svc/10.244.7.5:9200 (10.244.7.5)"}
{"type":"log","@timestamp":"2022-06-14T08:10:44+00:00","tags":["error","savedobjects-service"],"pid":1217,"message":"Unable to retrieve version information from Elasticsearch nodes. security_exception: [security_exception] Reason: unable to authenticate user [default-quickstart-kibana-user] for REST request [/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip]"}

Looking at this, I'm wondering why it first tells me that the versions aren't compatible, and then tells me it can't retrieve the versions?

K8s events:

# kubectl get events
LAST SEEN   TYPE      REASON                OBJECT                               MESSAGE
110s        Warning   Unhealthy             pod/quickstart-kb-56ffc5f89f-vvxb8   Readiness probe failed: HTTP probe failed with statuscode: 503
60m         Warning   ReconciliationError   kibana/quickstart                    Reconciliation error: auth secret key default-quickstart-kibana-user doesn't exist
7m36s       Warning   ReconciliationError   kibana/quickstart                    Reconciliation error: auth secret key default-quickstart-kibana-user doesn't exist

Nothing new I presume...
Other than that, I am a bit out of ideas of where to look

Before the start of the upgrade, everything ran smoothly. But I did have some issues at the beginning of the process, regarding volume attachments that failed with a "Multi Attach Error". From what I learned of K8s to date, I don't think that could have anything to do with information from a secret getting (partially!) lost.

EDIT:
I tried to put the value of secret default-quickstart-kibana-user as the value of that key under quickstart-kibana-user, but changes to the secret don't seem to be accepted, even though kubectl tells me it edited the secret:

# echo e1BCS... | read output;kubectl patch secret quickstart-kibana-user -p="{\"data\":{\"default-quickstart-kibana-user\": \"$output\"}}" -v=1
secret/quickstart-kibana-user patched

I decided to remove the kibana instance and set it up again... thanks for your help.

Yes removing and re-creating the Kibana instance should work in any case. I was still interested in the error you saw in case it points to a bug in the operator. But I was not able to reproduce your issue when upgrading a Elasticsearch/Kibana in the same way you did.

Just to give some additional background. We transitioned to using service account tokens instead of username/password for the internal communication between Kibana and Elasticsearch in 7.17.0 and you got somehow caught in the middle during your upgrade attempt. The error you saw as I understand it was a Kibana still expecting a username password combo in the secret and the operator already having replaced it with a service account token.

Thanks for elaborating.
IIRC, I updated the operator (don't remember from what version, but definitely <2) after updating Elasticsearch, but before updating Kibana, just because I forgot to update Kibana.
Maybe that had something to do with it?