"Start a multi-node cluster with Docker Compose" example does not work on 8.18.0

I followed the instructions in "Start a multi-node cluster with Docker Compose" to create a Elasticsearch/Kibana cluster on my machine. The Elasticsearch nodes start but cannot be connected to.

My docker-compose.yml is identical to the one linked to by the documentation. This is my .env file.

# Password for the 'elastic' user (at least 6 characters)
ELASTIC_PASSWORD=redbluegreen

# Password for the 'kibana_system' user (at least 6 characters)
KIBANA_PASSWORD=redbluegreen

# Version of Elastic products
STACK_VERSION=8.18.0

# Set the cluster name
CLUSTER_NAME=local-mcp-cluster

# Set to 'basic' or 'trial' to automatically start the 30-day trial
LICENSE=basic
#LICENSE=trial

# Port to expose Elasticsearch HTTP API to the host
ES_PORT=9200
#ES_PORT=127.0.0.1:9200

# Port to expose Kibana to the host
KIBANA_PORT=5601
#KIBANA_PORT=80

# Increase or decrease based on the available host memory (in bytes)
MEM_LIMIT=1073741824

# Project namespace (defaults to the current folder name if not set)
#COMPOSE_PROJECT_NAME=myproject

When I do a docker compose up -d, the "setup" service completes successfully. All of the "es*" Elasticsearch services start and are healthy. The "kibana" service starts but never passes its health check. http://0.0.0.0:5601/ never returns a response. https://localhost:9200/ returns "You connection is not private".

I don't see any obvious errors in the logs for the Elasticsearch or Kibana servers.

The certificates appear to be in the expected directory.

elasticsearch@4a262df39ab2:~$ ls -ll /usr/share/elasticsearch/config/certs
total 32
drwxr-x--- 2 root root 4096 Apr 30 19:41 ca
-rw-r----- 1 root root 2535 Apr 30 19:41 ca.zip
-rw-r----- 1 root root 7614 Apr 30 19:41 certs.zip
drwxr-x--- 2 root root 4096 Apr 30 19:41 es01
drwxr-x--- 2 root root 4096 Apr 30 19:41 es02
drwxr-x--- 2 root root 4096 Apr 30 19:41 es03
-rw-r----- 1 root root  272 Apr 30 19:41 instances.yml

I see the same error with stack version 8.18.0 and 8.17.5.

This is a Macbook Pro running OS X 15.4.1. I have made this setup work on this machine before, but I can't make it work now. Again, all my files are identical to what is linked from the documentation.

You can double check this by checking out the elasticsearch-not-starting branch of the Local MCP project on GitLab. That is exactly what I am running.

What could be going wrong? How do I debug this?

Hi @wpm
Try deleting all the containers and all the volumes completely. Make sure they're gone and try again from scratch.

What happens sometimes is it doesn't work the first time.... And then since the certs directory is there it doesn't recreate everything from scratch and then you get into a loop where it doesn't work.

Where it skips the startup but it's no longer no longer valid... You get in this Loop.

That would be my first all the volumes, the data volumes the cert volumes, delete them all

And try again

Yep. I've been doing that.

Clean everything up start withount the -d and watch the output.... What are the exact errors?

There are errors.... Somewhere...

Have you tried to exec into the elasticsearch es01 container and see if you can curl the endpoint? Correctly as shown in the heath check?

Have you exec'd into the Kibana and curl esoO1 elasticsearch via https

Another thing that happens is that docker desktop does not have enough resources.

And another things...that was a bug that was fixed is that I was deleting the volumes via docker desktop...and that was not happening but pretty sure that bug was fixed quite some time ago

I do see a suspicious Java stack trace in one of the Elasticsearch container logs.

I started from scratch and made sure to delete everything: all old containers and volumes. The behavior persists.

All the es* servers pass their health checks when I shell into them and run the checks in the container.

The Kibana health check hangs when I shell into the container and run the health check in the terminal.

I get this when I try to curl one of the Elasticsearch servers from inside the Kibana container.

kibana@2a447a0e8809:~$ curl -I --cacert config/certs/ca/ca.crt -u elastic:redbluegreen https://es01:9200
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="security", charset="UTF-8"
WWW-Authenticate: Bearer realm="security"
WWW-Authenticate: ApiKey
content-type: application/json
content-length: 467

I see the same thing with es02 and es03.

The config/certs/ca/ca.crt in the Kibana container exists and looks valid.

Here is a grep for "error" in the Kibana container logs.

2025-05-01T18:54:48.612919546Z [2025-05-01T18:54:48.612+00:00][INFO ][plugins.notifications] Email Service Error: Email connector not specified.

2025-05-01T19:04:57.274771883Z [2025-05-01T19:04:57.271+00:00][ERROR][savedobjects-service] [.kibana_security_solution] Action failed with '[index_not_green_timeout] Timeout waiting for the status of the [.kibana_security_solution_8.18.0_001] index to become 'green' Refer to https://www.elastic.co/guide/en/kibana/8.18/resolve-migrations-failures.html#_repeated_time_out_requests_that_eventually_fail for information on how to resolve the issue.'. Retrying attempt 1 in 2 seconds.
...

There are many other timeout errors as the process keeps retrying.

Here is a grep for "error" in the es01 container log.

2025-05-01T18:54:27.840002884Z {"@timestamp":"2025-05-01T18:54:27.839Z", "log.level": "WARN", "message":"Failed to revoke access to default inference endpoint IDs: [rainbow-sprinkles], error: org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es01][inference_utility][T#2]","log.logger":"org.elasticsearch.xpack.inference.services.elastic.authorization.ElasticInferenceServiceAuthorizationHandler","elasticsearch.cluster.uuid":"mTfA69p5Reu8EwXUVDBGJg","elasticsearch.node.id":"bRMK3MSpTuylFiFVJskonQ","elasticsearch.node.name":"es01","elasticsearch.cluster.name":"local-mcp-cluster"}

2025-05-01T19:02:32.114175302Z {"@timestamp":"2025-05-01T19:02:32.108Z", "log.level": "WARN", "message":"caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/172.21.0.3:9200, remoteAddress=/172.21.0.6:56536}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es01][transport_worker][T#1]","log.logger":"org.elasticsearch.http.AbstractHttpServerTransport","elasticsearch.cluster.uuid":"mTfA69p5Reu8EwXUVDBGJg","elasticsearch.node.id":"bRMK3MSpTuylFiFVJskonQ","elasticsearch.node.name":"es01","elasticsearch.cluster.name":"local-mcp-cluster","error.type":"io.netty.handler.codec.DecoderException","error.message":
"javax.net.ssl.SSLProtocolException: Unexpected exception","error.stack_trace":"io.netty.handler.codec.DecoderException: javax.net.ssl.SSLProtocolException: Unexpected exception
at io.netty.codec@4.1.118.Final/io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:500)
at io.netty.codec@4.1.118.Final/io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at io.netty.transport@4.1.118.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.transport@4.1.118.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.transport@4.1.118.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.transport@4.1.118.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
at io.netty.transport@4.1.118.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.transport@4.1.118.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.transport@4.1.118.Final/io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
at io.netty.transport@4.1.118.Final/io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.transport@4.1.118.Final/io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796)
at io.netty.transport@4.1.118.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:697)
at io.netty.transport@4.1.118.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:660)
at io.netty.transport@4.1.118.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.common@4.1.118.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
at io.netty.common@4.1.118.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:1447)
Caused by: javax.net.ssl.SSLProtocolException: Unexpected exception
at java.base/sun.security.ssl.SSLEngineInputRecord.decodeInputRecord(SSLEngineInputRecord.java:245)
at java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:196)
at java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:159)
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
at java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:734)
at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:689)
at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:504)
at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:480)
at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:673)
at io.netty.handler@4.1.118.Final/io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:309)
at io.netty.handler@4.1.118.Final/io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1485)
at io.netty.handler@4.1.118.Final/io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1378)
at io.netty.handler@4.1.118.Final/io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1427)
at io.netty.codec@4.1.118.Final/io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
at io.netty.codec@4.1.118.Final/io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
... 16 more\nCaused by: java.security.GeneralSecurityException: Unexpected plaintext alert received: Level: fatal; Alert: unknown_ca
at java.base/sun.security.ssl.SSLCipher$T13GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1856)
at java.base/sun.security.ssl.SSLEngineInputRecord.decodeInputRecord(SSLEngineInputRecord.java:239)
... 30 more
"}

That Java stack trace looks like a problem but I don't know what it means. I only see it in es01. I don't see it in the other Elasticsearch containers.

In the Elasticsearch desktop I have a memory limit of 7.9 GB.

@wpm

Sooo I ran your exact code from here ... On My Mac Intel Chip Sequoia 15.4.1

And it came up and ran first time....

I do notice the you .env is named weird in this repo... So I renamed it to .env

So when you exec into an es01 container and run (try the -v please)

curl -v --cacert config/certs/ca/ca.crt -u elastic:redbluegreen https://es01:9200

and you get
401 Unauthorized

Means the password is not set correctly ... which mean the kibana_system password will not get setup correctly somewhere buried in the setup logs it is there

So did you clean up the data directories as well... not sure what your issue is...

Assume You Mean Docker Desktop

You can also comment out es02 and es03 to see if that makes a difference... after you clean everything up... include all the mounts.

You can event delete kibana and if it is only es01 and you exec in and you can not curl the endpoint then something else is wrong... simple / fundamental / but non-obvious like your .env is not what you think it is... not enough resources etc.. its there

Because your code works...

1 Like

Likely Issues

  1. Kibana can't connect to Elasticsearch usually due to SSL/cert issues or mismatched passwords.
  2. Self-signed certs Kibana might be rejecting the Elasticsearch cert.
  3. Incorrect password for kibana_system ensure it's consistent across .env`, setup, and internal config.
  4. Kibana not binding or crashing silently healthcheck failure but logs look clean.

What You Should Try

  1. Re-run setup cleanly:

bash

CopyEdit

docker compose down -v
rm -rf certs/ setup/
docker compose up setup
docker compose up -d
  1. Check Kibana logs carefully:

bash

CopyEdit

docker compose logs kibana

Look for lines mentioning:
Unable to connect to Elasticsearch certificate verify failed
kibana_system authentication failed`
3. Test connectivity manually from inside Kibana container:

bash

CopyEdit

docker exec -it <kibana_container_id> curl -k https://es01:9200
  1. Verify Elasticsearch password works:

bash

CopyEdit

curl -k -u elastic:redbluegreen https://localhost:9200
  1. Check if Kibana port is actually open:

bash

CopyEdit

curl -I http://localhost:5601
  1. Check Docker Desktop resources Ensure Docker has enough memory/CPU.
1 Like

I started over with Start a single-node cluster in Docker and everything works. I find those setup instructions easier to follow.

I'm not sure what I was doing wrong, but I'll refer back to this if I get in a similar situation I'll refer back to these troubleshooting tips.

Thanks for your help.