Agent bad_certificate

There are quite a few search results on this topic, but they are all with elastic 7.x. I'm seeing it with 8.3.2. I'm certain that I'm doing something wrong, but for the life of me can not figure out what.

I'm using docker and started with the docker-compose.yml from the install docs at elastic.co.

I made a few tweaks ...

  • removed es02 & es03 and all references to them. (So single elastic node)
  • added an nginx reverse proxy with a Let's Encrypt cert for Kibana
  • added my hostname to the shell script portion of the setup container so the certs have the fqdn in the cert.

The rest is the same. Now I have tweaked somethings trying to resolve this:

  • switched transport and http certificate verification to -> none ( which had no effect)
  • various options for ssl with the agent install and the elasticsearch container.

If I run a curl with the either a "-k" or by using the ca.crt generated by the setup container shell script, then the curl works just fine.

# curl --cacert /etc/elastic/ca/ca.crt -u elastic https://elk.immauss.com:9200                                        
Enter host password for user 'elastic':
{
  "name" : "es01",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "LiBopHX9QPaVvqXkpq3RdQ",
  "version" : {
    "number" : "8.3.2",
    "build_type" : "docker",
    "build_hash" : "8b0b1f23fbebecc3c88e4464319dea8989f374fd",
    "build_date" : "2022-07-06T15:15:15.901688194Z",
    "build_snapshot" : false,
    "lucene_version" : "9.2.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

But as soon as I try to install the Fleet server, I started getting these errors from the elasticsearch container:

{"@timestamp":"2022-07-28T07:45:40.336Z", "log.level": "WARN", "message":"caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/172.18.0.3:9200, remoteAddress=/45.79.188.135:40544}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es01][transport_worker][T#1]","log.logger":"org.elasticsearch.http.AbstractHttpServerTransport","elasticsearch.cluster.uuid":"LiBopHX9QPaVvqXkpq3RdQ","elasticsearch.node.id":"e764Y97MReGlHv8V4QK6aA","elasticsearch.node.name":"es01","elasticsearch.cluster.name":"docker-cluster","error.type":"io.netty.handler.codec.DecoderException","error.message":"javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate","error.stack_trace":"io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate

(There's about 30 more lines of java errors to follow)

I've tried using the --fleet-server-es-insecure option, as well as specifying the ca.crt from the Elasticsearch container with --fleet-server-es-ca.

So the question is. ....
What am I doing wrong ?

It seems the problem is on Elasticsearch side, it looks like Fleet is sending a certificate to Elasticsearch that trows this "bad_certificate" error.

Could you post here the whole CLI command you're using when starting Fleet server? Just redact any sensitive information.

I worked this out earlier today ...

For some reason ..... some portion of the agent is:
1. Ignoring the "insecure" switches
2. Ignoring the switch pointing to the CA.crt
3. and falling back the local CA store.

To resolve, you need to add the ca.crt from your elastic install into the local CA store. On a RHEL based system:

As root ( or with sudo)
Copy your ca.crt to the local store for import

cp ca.crt /etc/pki/ca-trust/source/anchors/elastic-ca.crt

Then run:

update-ca-trust 

Then restart the agent:

elastic-agent restart

Viola!

While this resolves this issue, I feel like there is bug here ....

-Scott

Adding the certs to the CA store is definitely an option, but it is possible to make everything work without touching the host CA store.

Managing those certificates is not the easiest of tasks, and it's quite error prone.

One thing that helps is to pass the --fleet-server-es-ca-trusted-fingerprint when starting the Elastic-Agent so instead of having to deal with the certificates, you only need to copy/past the fingerprint printed out by Elasticsearch when you first start the cluster.