Deploying ELK cluster on production server - generated and existing server certificates not working

Hi there,

I've followed the instructions found here Install Elasticsearch with Docker | Elasticsearch Guide [8.6] | Elastic and managed to get a docker-compose ES cluster + Kibana setup working locally. What's more, I've also successfully ingested my data through logstash and confirmed everything is searchable as I expected. So now I have a setup container that runs and gets the es01, es02, and es03 containers off the ground, and also gets kibana set up as well. Once that all gets going, the logstash container gets going and manages to ingest the data before shutting down - great!

But now I am trying to deploy this set-up in a production environment and think I'm not quite getting my certificates set up correctly. This production server does have its own certificate and key that I've tried to use as the certificate authority to create the signed certificates for the ES nodes, but that doesn't work either. Here is what I've tried to do, and what I've seen as outcomes:

  1. Running the default instructions (from the tutorial I linked above), which generates its own certificate authority and instance certificates. This uses the following code:
## Creates its own CA
bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip
unzip config/certs/ca.zip -d config/certs

## Creates es01, es02, es03 certificates signed by newly-created CA
echo -ne \
        "instances:\n" \
        "  - name: es01\n" \
        "    dns:\n" \
        "      - es01\n" \
        "      - localhost\n" \
        "    ip:\n" \
        "      - 127.0.0.1\n" \
        "  - name: es02\n" \
        "    dns:\n" \
        "      - es02\n" \
        "      - localhost\n" \
        "    ip:\n" \
        "      - 127.0.0.1\n" \
        "  - name: es03\n" \
        "    dns:\n" \
        "      - es03\n" \
        "      - localhost\n" \
        "    ip:\n" \
        "      - 127.0.0.1\n" \
        >config/certs/instances.yml
bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key

This whole process runs pretty close to completion, getting es01, es02, es03, and kibana off the ground but then having issues communicating between containers. I see a never-ending stream of the following lines, where locally it only appears a few times before connecting:

"log.level": "INFO", "message":"Authentication of [kibana_system] was terminated by realm [reserved] - failed to authenticate user [kibana_system]"

If I exec into the kibana container and run the following command, I get an error message:

curl -s -X GET --cacert "/usr/share/kibana/config/certs/ca/ca.crt" -H "Authorization: Basic `echo -n kibana_system:tmppassword|base64 -`" https://es01:9200/_security/_authenticate?pretty=true
{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "unable to authenticate user [kibana_system] for REST request [/_security/_authenticate?pretty=true]",
        "header" : {
          "WWW-Authenticate" : [
            "Basic realm=\"security\" charset=\"UTF-8\"",
            "Bearer realm=\"security\"",
            "ApiKey"
          ]
        }
      }
    ],
    "type" : "security_exception",
    "reason" : "unable to authenticate user [kibana_system] for REST request [/_security/_authenticate?pretty=true]",
    "header" : {
      "WWW-Authenticate" : [
        "Basic realm=\"security\" charset=\"UTF-8\"",
        "Bearer realm=\"security\"",
        "ApiKey"
      ]
    }
  },
  "status" : 401
}

If I try this command locally, I get the expected result:

{
  "username" : "kibana_system",
  "roles" : [
    "kibana_system"
  ],
  "full_name" : null,
  "email" : null,
  "metadata" : {
    "_reserved" : true
  },
  "enabled" : true,
  "authentication_realm" : {
    "name" : "reserved",
    "type" : "reserved"
  },
  "lookup_realm" : {
    "name" : "reserved",
    "type" : "reserved"
  },
  "authentication_type" : "realm"
}

In the production environment, none of the certificates I try work. I'm also pretty sure the es containers can't communicate with each other, which I think is actually what is going on (none of the containers are able to communicate with each other).

So I decided I should try and run the above shell code using the existing certificate/key needed to access the web app. I know the existing certificate/key pair is valid, as the existing webapp I am trying to test this ES/Kibana deployment out with needs them to run:

    ## Trying to run the same code as above, but instead of creating the CA, using the existing server certificate as the CA

    echo -ne \
        "instances:\n" \
        "  - name: es01\n" \
        "    dns:\n" \
        "      - es01\n" \
        "      - localhost\n" \
        "    ip:\n" \
        "      - 127.0.0.1\n" \
        "  - name: es02\n" \
        "    dns:\n" \
        "      - es02\n" \
        "      - localhost\n" \
        "    ip:\n" \
        "      - 127.0.0.1\n" \
        "  - name: es03\n" \
        "    dns:\n" \
        "      - es03\n" \
        "      - localhost\n" \
        "    ip:\n" \
        "      - 127.0.0.1\n" \
        >config/certs/instances.yml

## Trying to use the server certificate/keys as the CA to generate the es certs
bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert path/to/public.crt --ca-key path/to/private.key

Now when I try and run things using docker-compose, the setup container hangs, saying it is starting. When I go into the hung container and navigate to config/certs/ all I see is the instances.yml file - no certs.zip or es01/es02/es03 folders like before. After a while the container exits, failing to get it going.

So after all this, I have a couple of questions:

  1. Why don't self-generated certificates work for a ES cluster/Kibana stack on a production server? Is it like I assume, that it is because the CA needs to come from a valid distributor, like the one I tried in the latter half of my case, which doesn't matter in a local environment?

  2. Why doesn't the elasticsearch-certutil work when I provide the server's existing public.crt and private.key files as arguments for the --ca-cert and --ca-key? Is that not how CAs are supposed to work in this case? Or is there a specific way I am supposed to use this util in this case?

Am I going in the right direction for getting this test case to work on a production server (docker-compose with setup/es01/es02/es03/kibana)? I would just like to get a simple deployment to work in a production environment so that I at least understand better what needs to be done to get things communicating properly, and so that I can run simple elasticsearch queries like I've been able to do locally.

Thank you!

If this is the error message, it does not seem to be a certificate problem. Because this error is from the application layer which is already past the networking (SSL) layer.

Why don't self-generated certificates work for a ES cluster/Kibana stack on a production server? Is it like I assume, that it is because the CA needs to come from a valid distributor, like the one I tried in the latter half of my case, which doesn't matter in a local environment?

There is no technical reason why self-generated certificates cannot work in production environment. As said above, the issue appears to be elsewhere. You might want to enable debug level logs for ES to find out more about why kibana_system cannot authenticate:

PUT _cluster/settings
{
  "transient": {
    "logger.org.elasticsearch.xpack.security.authc": "trace"
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.