Secure Setting gcs.client.default.credentials_file does not get updated


I am experiencing an issue with a deployment of ECK.

I have updated our k8s Secret which contains gcs.client.default.credentials_file however this does not have any effect. When I try to create a new Snapshot Repository, the request fails and I believe the API attempts to establish a connection to the GCS API which fails. I see this "{\"error\":\"invalid_grant\",\"error_description\":\"Invalid grant: account not found\"}", in the logs. (I deliberately deployed a key file with an account that does not exist for the first time deployment, so that I could test updating this to a valid key file)

I dug around in the init container which does this here, and at first I thought this may be related to the elasticsearch-keystore add-file command not running again, due to the presence of elastic-internal-init-keystore.ok, however I tested this by adding a new init container in the pod template, which explicitly calls /usr/share/elasticsearch/bin/elasticsearch-keystore add-file --force gcs.client.default.credentials_file /mnt/elastic-internal/secure-settings/gcs.client.default.credentials_file, and this still didn't work even after cycling all the pods (master, hot, warm) in the cluster.

I can confirm that the new secret value does make it to the volume as I added a volume mount (just like the elastic-internal-init-keystore init container does)

                  - mountPath: /mnt/elastic-internal/secure-settings
                    name: elastic-internal-secure-settings
                    readOnly: true

Which I could see when I exec'd into the new init container.

At this point, the only way I've managed to successfully update the ServiceAccount key is to delete the Cluster in a dev environment and redeploy, which is not something I can do in production of course.

Any idea what is happening ?

Some additional context to the above issue. The deployment is of the ECK operator version 1.2.0 on GKE.

Have you tried reloading the secure settings as described here?

Yes, I've tried that and it had no effect. I believe the issue is with the cluster having existing issues with previous updates, which blocks the operator from restarting the nodes with the new secrets. Shall we close this thread and continue the discussion on the github issue ?