New build docker-compose multi-node cluster fails to retrieve password hash for reserved user [elastic] / at least one primary shard for the index [.security-7] is unavailable

cookersjs · February 24, 2023, 1:37am

Hi there,

I've been following the instructions from Install Elasticsearch with Docker | Elasticsearch Guide [8.6] | Elastic (#docker-compose-file for multi-node cluster) and I keep running into an error that seems common enough but none of the solutions I see online have worked or even made headway on the problem.

Following the instructions from that page verbatim, I created a new directory, the .env file (adding ELASTIC_PASSWORD and KIBANA_PASSWORD values that do not contain symbols), and copying exactly the docker-compose.yml file found there, then running 'docker-compose up'. When I navigate to localhost:5601, I see 'Kibana server is not ready yet'.

Looking at the docker-compose output, I see many cases of:

"log.level":"ERROR", "message":"failed to retrieve password hash for reserved user [elastic]"..."error.message":"at least one primary shard for the index [.security-7] is unavailable","error.stack_trace":"org.elasticsearch.action.UnavailableShardsException: at least one primary shard for the index [.security-7] is unavailable...

This seems to be an error that can affect existing clusters, but is unusual because this is my first time building the cluster following those instructions.

I found the following thread: Accidentally deleted .security index for x-pack - #2 by Johntdyer, where Tim V gives good detail on how to fix it, but I run into issues trying to set the password for the 'elastic' built-in user. Following his instructions, I came to the 'Built-in Users' section (Setting Up User Authentication | X-Pack for the Elastic Stack [6.2] | Elastic). There are two unexpected things from that document that differ for me:

(Minor issue) When I exec into the 'es01' container, there is no 'bin/x-pack' folder, but there is a 'bin/elasticsearch-setup-passwords' executable.
(Actual issue) When I run bin/elasticsearch-setup-passwords interactive, it fails due to a CertificateException:

elasticsearch@b81f8fcc6d59:~$ bin/elasticsearch-setup-passwords interactive
01:27:10.610 [main] WARN  org.elasticsearch.common.ssl.DiagnosticTrustManager - failed to establish trust with server at [192.168.160.3]; the server provided a certificate with subject name [CN=es01], fingerprint [77d14caab00883aa3c55a05267ddc6bdcfe27211], no keyUsage and no extendedKeyUsage; the certificate is valid between [2023-02-24T00:26:14Z] and [2026-02-23T00:26:14Z] (current time is [2023-02-24T01:27:10.607014876Z], certificate dates are valid); the session uses cipher suite [TLS_AES_256_GCM_SHA384] and protocol [TLSv1.3]; the certificate has subject alternative names [DNS:localhost,IP:127.0.0.1,DNS:es01]; the certificate is issued by [CN=Elastic Certificate Tool Autogenerated CA] but the server did not provide a copy of the issuing certificate in the certificate chain; the issuing certificate with fingerprint [c75e81dc750322de438d2ebf8b0d3d227b77abe9] is trusted in this ssl context ([xpack.security.http.ssl (with trust configuration: PEM-trust{/usr/share/elasticsearch/config/certs/ca/ca.crt})])
java.security.cert.CertificateException: No subject alternative names matching IP address 192.168.160.3 found
	at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:164) ~[?:?]
	at sun.security.util.HostnameChecker.match(HostnameChecker.java:101) ~[?:?]
	at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:458) ~[?:?]
	at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:432) ~[?:?]
	at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:238) ~[?:?]
	at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:132) ~[?:?]
	at org.elasticsearch.common.ssl.DiagnosticTrustManager.checkServerTrusted(DiagnosticTrustManager.java:80) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1335) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1226) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1169) ~[?:?]
	at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) ~[?:?]
	at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) ~[?:?]
	at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:458) ~[?:?]
	at sun.security.ssl.TransportContext.dispatch(TransportContext.java:201) ~[?:?]
	at sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1510) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1425) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426) ~[?:?]
	at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:578) ~[?:?]
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:187) ~[?:?]
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:142) ~[?:?]
	at org.elasticsearch.xpack.core.common.socket.SocketAccess.lambda$doPrivileged$0(SocketAccess.java:42) ~[?:?]
	at java.security.AccessController.doPrivileged(AccessController.java:569) ~[?:?]
	at org.elasticsearch.xpack.core.common.socket.SocketAccess.doPrivileged(SocketAccess.java:41) ~[?:?]
	at org.elasticsearch.xpack.core.security.CommandLineHttpClient.execute(CommandLineHttpClient.java:178) ~[?:?]
	at org.elasticsearch.xpack.core.security.CommandLineHttpClient.execute(CommandLineHttpClient.java:112) ~[?:?]
	at org.elasticsearch.xpack.security.authc.esnative.tool.SetupPasswordTool$SetupCommand.checkElasticKeystorePasswordValid(SetupPasswordTool.java:340) ~[?:?]
	at org.elasticsearch.xpack.security.authc.esnative.tool.SetupPasswordTool$InteractiveSetup.execute(SetupPasswordTool.java:203) ~[?:?]
	at org.elasticsearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:54) ~[elasticsearch-8.6.2.jar:8.6.2]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:85) ~[elasticsearch-cli-8.6.2.jar:8.6.2]
	at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:94) ~[elasticsearch-cli-8.6.2.jar:8.6.2]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:85) ~[elasticsearch-cli-8.6.2.jar:8.6.2]
	at org.elasticsearch.cli.Command.main(Command.java:50) ~[elasticsearch-cli-8.6.2.jar:8.6.2]
	at org.elasticsearch.launcher.CliToolLauncher.main(CliToolLauncher.java:64) ~[cli-launcher-8.6.2.jar:8.6.2]

SSL connection to https://192.168.160.3:9200/_security/_authenticate?pretty failed: No subject alternative names matching IP address 192.168.160.3 found
Please check the elasticsearch SSL settings under xpack.security.http.ssl.


ERROR: Failed to establish SSL connection to elasticsearch at https://192.168.160.3:9200/_security/_authenticate?pretty.

One last thing I tried to do was the 'elasticsearch-reset-password' command. When I do that, sometimes I get the above CertificateException, and other times I get a message because it checks the health but the cluster health is red:

elasticsearch@927d88e11f41:~$ bin/elasticsearch-reset-password -u elastic
WARNING: Owner of file [/usr/share/elasticsearch/config/users] used to be [root], but now is [elasticsearch]
WARNING: Owner of file [/usr/share/elasticsearch/config/users_roles] used to be [root], but now is [elasticsearch]
Failed to determine the health of the cluster. Cluster health is currently RED.
This means that some cluster data is unavailable and your cluster is not fully functional.
The cluster logs (https://www.elastic.co/guide/en/elasticsearch/reference/8.6/logging.html) might contain information/indications for the underlying cause
It is recommended that you resolve the issues with your cluster before continuing
It is very likely that the command will fail when run against an unhealthy cluster.

If you still want to attempt to execute this command against an unhealthy cluster, you can pass the `-f` parameter.

ERROR: Failed to determine the health of the cluster. Cluster health is currently RED.

Looking back at the docker logs, I see this message pertaining to the cluster health changing from yellow to red:

"log.level": "INFO", "current.health":"RED","message":"Cluster health status changed from [YELLOW] to [RED] (reason: [reconcile-desired-balance]).","previous.health":"YELLOW","reason":"reconcile-desired-balance" , "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es02][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.routing.allocation.AllocationService","elasticsearch.cluster.uuid":"9-sckL0BTKaVob9Ovfix3A","elasticsearch.node.id":"xahMdoYyQpSy4FYsFmxbsg","elasticsearch.node.name":"es02","elasticsearch.cluster.name":"docker-cluster"}

Other details that might help troubleshoot:

Docker version 20.10.10, build b485636
I am on iOS Big Sur, v11.5.1 (intel chip)
I tried this with STACK_VERSION set to 8.6.2 and 8.5.0 and the error was the exact same
This is not the first time I have had Elasticsearch/Kibana containers running on my computer. My main project has had ES and Kibana running for a while now, though they weren't really used (think the person who managed it before me just set it up so these containers were running, but weren't actually doing much) - is it possible there is some caching thing involved? That might help explain the 'missing' .security-7 index
I saw one thread online that also seemed relevant that related to caching and seemed similar to my experience, but the solution they gave was not sufficient for me to solve the problem: In docker(docker-compose), I got some errors

Thank you in advance!

Yang_Wang · February 24, 2023, 2:24am

The cluster has not formed yet, i.e. its health RED. Error of the security index is a victim of it. You need fix the cluster formation issue first. If cache is the issue as suggested by this post, I guess it might be related to the persistent volume created by docker-compose. If what you want is a fresh cluster, you should delete them. You can check the volume with something like

docker volume ls

and delete the relevant ones declared in the docker compose file. NOTE please be sure that you absolutely do not need any data from the volumes before deleting because you will not be able to recover them later.

cookersjs · February 27, 2023, 5:42pm

Thank you Yang_Wang! I ended up doing something similar I think: I did a docker system prune since I had run out of disc space for my docker containers, something I was used to doing and a command I trusted (Note I did it WHILE my app was running). Next time I tried to build the ES demo, I didn't see the error anywhere and looks like things are running correctly - I can access localhost:5601.

For anyone that might see this in the future - I would probably not recommend the docker system prune unless you know it is safe to do, but it does look like some sort of caching thing was at play that needed to get cleared out. Inspecting your volumes and deleting any that might be causing caching issues with elasticsearch seems like the right approach if you see the errors I describe above!

tchartron · March 16, 2023, 11:07am

Hi,
Wanted to say thank you as we were experiencing the same issue for some reason and we were able to fix it pretty quickly using your solution. So thanks for taking the time to share this !

system · April 13, 2023, 11:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trouble running the ELK-8.3.3 docker-compose instructions. failed to retrieve password hash Elasticsearch docker	1	231	September 15, 2022
When opening kibana elasticsearch crash with following error Kibana	5	513	December 15, 2020
Unable to start elasticsearch 7.8 container with a password Elasticsearch elastic-stack-security	3	1337	September 14, 2020
Unable to start elasticsearch 7.8 container with a password Elasticsearch	2	427	October 11, 2020
[SOLVED]After upgrade to 7.7.0 in a cluster of 3 nodes. - failed to retrieve password hash for reserved user [elastic] Elasticsearch elastic-stack-security	2	3312	June 25, 2020

New build docker-compose multi-node cluster fails to retrieve password hash for reserved user [elastic] / at least one primary shard for the index [.security-7] is unavailable

Related topics