Problem securing entire stack

I have been trying to secure communications within (and connections to) my elastic stack, and am running into issues. I will just go through my configuration and what I've done so far.

I have three servers (let's call them elk1, elk2, and elk3), each with an Elasticsearch data node that is eligible to be master. Each of the servers also has logstash and kibana running on it.

First, I created a CA for the cluster (called elk-stack-ca.p12):

/usr/share/elasticsearch/bin/elasticsearch-certutil ca

Then, I created a certificate for the Elasticsearch nodes (called elk-certificates.p12):

/usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca elk-stack-ca.p12 --dns elk1,elk2,elk3

I set up the xpack.security options on each node (elasticsarch.yml looks like this...the name is different for each one, of course):

cluster.name: elkcluster
node.name: elk1
network.host: 0.0.0.0
http.port: 9200
node.roles: [ master, data ]
thread_pool.write.queue_size: 3000
discovery.seed_hosts: ["elk1", "elk2", "elk3"]
cluster.initial_master_nodes: ["elk1", "elk2", "elk3"]
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
xpack.security.enabled: True
xpack.security.transport.ssl.enabled: True
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: elk-certificates.p12
xpack.security.transport.ssl.truststore.path: elk-certificates.p12
xpack.security.http.ssl.enabled: True
xpack.security.http.ssl.verification_mode: certificate
xpack.security.http.ssl.client_authentication: required
xpack.security.http.ssl.keystore.path: elk-certificates.p12
xpack.security.http.ssl.truststore.path: elk-certificates.p12

The nodes fail to start up with these errors:

[2021-10-14T16:07:05,283][ERROR][o.e.b.Bootstrap          ] [elk1] Exception
org.elasticsearch.ElasticsearchSecurityException: failed to load SSL configuration [xpack.security.http.ssl]
        at org.elasticsearch.xpack.core.ssl.SSLService.lambda$loadSSLConfigurations$5(SSLService.java:528) ~[?:?]
        at java.util.HashMap.forEach(HashMap.java:1425) ~[?:?]
        at java.util.Collections$UnmodifiableMap.forEach(Collections.java:1521) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.SSLService.loadSSLConfigurations(SSLService.java:524) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.SSLService.<init>(SSLService.java:142) ~[?:?]
        at org.elasticsearch.xpack.core.XPackPlugin.createSSLService(XPackPlugin.java:411) ~[?:?]
        at org.elasticsearch.xpack.core.XPackPlugin.createComponents(XPackPlugin.java:274) ~[?:?]
        at org.elasticsearch.node.Node.lambda$new$14(Node.java:522) ~[elasticsearch-7.9.0.jar:7.9.0]
...
Caused by: java.io.IOException: keystore password was incorrect
        at sun.security.pkcs12.PKCS12KeyStore.engineLoad(PKCS12KeyStore.java:2118) ~[?:?]
        at sun.security.util.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:220) ~[?:?]
        at java.security.KeyStore.load(KeyStore.java:1472) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.TrustConfig.getStore(TrustConfig.java:97) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.StoreTrustConfig.createTrustManager(StoreTrustConfig.java:65) ~[?:?]
...
Caused by: java.security.UnrecoverableKeyException: failed to decrypt safe contents entry: javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
        at sun.security.pkcs12.PKCS12KeyStore.engineLoad(PKCS12KeyStore.java:2118) ~[?:?]
        at sun.security.util.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:220) ~[?:?]
        at java.security.KeyStore.load(KeyStore.java:1472) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.TrustConfig.getStore(TrustConfig.java:97) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.StoreTrustConfig.createTrustManager(StoreTrustConfig.java:65) ~[?:?]
...
[2021-10-14T16:07:05,290][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [elk1] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: ElasticsearchSecurityException[failed to load SSL configuration [xpack.security.http.ssl]]; nested: ElasticsearchException[failed to initialize SSL TrustManager]; nested: IOException[keystore password was incorrect]; nested: UnrecoverableKeyException[failed to decrypt safe contents entry: javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.];
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:174) ~[elasticsearch-7.9.0.jar:7.9.0]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.9.0.jar:7.9.0]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.9.0.jar:7.9.0]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) ~[elasticsearch-cli-7.9.0.jar:7.9.0]
        at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.9.0.jar:7.9.0]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.9.0.jar:7.9.0]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.9.0.jar:7.9.0]
Caused by: org.elasticsearch.ElasticsearchSecurityException: failed to load SSL configuration [xpack.security.http.ssl]
        at org.elasticsearch.xpack.core.ssl.SSLService.lambda$loadSSLConfigurations$5(SSLService.java:528) ~[?:?]
...
Caused by: org.elasticsearch.ElasticsearchException: failed to initialize SSL TrustManager
        at org.elasticsearch.xpack.core.ssl.StoreTrustConfig.createTrustManager(StoreTrustConfig.java:74) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.SSLService.createSslContext(SSLService.java:437) ~[?:?]
        at java.util.HashMap.computeIfAbsent(HashMap.java:1225) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.SSLService.lambda$loadSSLConfigurations$5(SSLService.java:526) ~[?:?]
        at java.util.HashMap.forEach(HashMap.java:1425) ~[?:?]
        at java.util.Collections$UnmodifiableMap.forEach(Collections.java:1521) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.SSLService.loadSSLConfigurations(SSLService.java:524) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.SSLService.<init>(SSLService.java:142) ~[?:?]
        at org.elasticsearch.xpack.core.XPackPlugin.createSSLService(XPackPlugin.java:411) ~[?:?]
        at org.elasticsearch.xpack.core.XPackPlugin.createComponents(XPackPlugin.java:274) ~[?:?]
        at org.elasticsearch.node.Node.lambda$new$14(Node.java:522) ~[elasticsearch-7.9.0.jar:7.9.0]
...
Caused by: java.io.IOException: keystore password was incorrect
        at sun.security.pkcs12.PKCS12KeyStore.engineLoad(PKCS12KeyStore.java:2118) ~[?:?]
        at sun.security.util.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:220) ~[?:?]
        at java.security.KeyStore.load(KeyStore.java:1472) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.TrustConfig.getStore(TrustConfig.java:97) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.StoreTrustConfig.createTrustManager(StoreTrustConfig.java:65) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.SSLService.createSslContext(SSLService.java:437) ~[?:?]
        at java.util.HashMap.computeIfAbsent(HashMap.java:1225) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.SSLService.lambda$loadSSLConfigurations$5(SSLService.java:526) ~[?:?]
...
Caused by: java.security.UnrecoverableKeyException: failed to decrypt safe contents entry: javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
        at sun.security.pkcs12.PKCS12KeyStore.engineLoad(PKCS12KeyStore.java:2118) ~[?:?]
        at sun.security.util.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:220) ~[?:?]
        at java.security.KeyStore.load(KeyStore.java:1472) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.TrustConfig.getStore(TrustConfig.java:97) ~[?:?]
        at org.elasticsearch.xpack.core.ssl.StoreTrustConfig.createTrustManager(StoreTrustConfig.java:65) ~[?:?]
...

I am hoping that someone can help point me in the direction of what to look into next to debug the issue.

Hi @jceddy

Have you added the passwords to the relevant keystore ?

I think I did when I originally set up transport ssl, but I didn't do anything additional when adding http ssl.

I see I need to run:

bin/Elasticsearch-keystore add xpack.security.http.ssl.keystore.secure_password
bin/Elasticsearch-keystore add xpack.security.http.ssl.truststore.secure_password

I'll do that and restart.

Adding the keystore passwords fixed it, thanks.

Now I am having an issue with Logstash connecting to Elasticsearch:

Converted the general PK12 certificate that I had created when first securing communication between Elasticsearch nodes:

openssl pkcs12 -in elk-certificates.p12 -out /etc/logstash/logstash.pem -clcerts -nokeys

And also obtained the CA into a CRT file:

openssl pkcs12 -in elk-certificates.p12 -cacerts -nokeys -chain | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > logstash-ca.crt

I added the info to the Logstash output filters:

input {
  beats {
    port => 5044
  }
}
output {
  if [fields][log_for] {
    elasticsearch {
      ssl => true
      ssl_certificate_verification => true
      cacert => '/etc/logstash/logstash.pem'
      hosts => [ "elk1:9200", "elk2:9200", "elk3:9200" ]
      user => "logstash_local"
      password => "REDACTED"
      index => "logstash-%{[fields][log_for]}-%{+YYYY.MM.dd}"
    }
  }
}

And added the info to the Logstash config file:

xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.username: logstash_local
xpack.monitoring.elasticsearch.password: REDACTED
xpack.monitoring.elasticsearch.hosts: [ "https://elk1:9200", "https://elk2:9200", "https://elk3:9200" ]
xpack.monitoring.elasticsearch.ssl.certificate_authority: /etc/logstash/logstash-ca.crt
xpack.monitoring.elasticsearch.sniffing: true
xpack.monitoring.collection.interval: 10s
xpack.monitoring.collection.pipeline.details.enabled: true

I am getting these error from Logstash (10.xxx.xxx.xx1 is the IP address of the elk1 server, etc):

[2021-10-14T17:25:28,057][ERROR][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash][7cd6dd12d7cf61636b45ef3b3af9abd08fb1c71c5e7c263521eb573db99a98c4] Encountered a retryable error. Will Retry with exponential backoff  {:code=>403, :url=>"https://10.xxx.xxx.xx1:9200/_monitoring/bulk?system_id=logstash&system_api_version=7&interval=1s"}
[2021-10-14T17:25:35,375][ERROR][logstash.licensechecker.licensereader] Unable to retrieve license information from license server {:message=>"Host name '10.xxx.xxx.xx1' does not match the certificate subject provided by the peer (CN=instance)"}
[2021-10-14T17:26:05,376][ERROR][logstash.licensechecker.licensereader] Unable to retrieve license information from license server {:message=>"Host name '10.xxx.xxx.xx2' does not match the certificate subject provided by the peer (CN=instance)"}
[2021-10-14T17:26:32,069][ERROR][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash][7cd6dd12d7cf61636b45ef3b3af9abd08fb1c71c5e7c263521eb573db99a98c4] Encountered a retryable error. Will Retry with exponential backoff  {:code=>403, :url=>"https://xxx.xxx.xx3:9200/_monitoring/bulk?system_id=logstash&system_api_version=7&interval=1s"}
[2021-10-14T17:26:35,376][ERROR][logstash.licensechecker.licensereader] Unable to retrieve license information from license server {:message=>"Host name '10.xxx.xxx.xx3' does not match the certificate subject provided by the peer (CN=instance)"}
[2021-10-14T17:27:05,376][ERROR][logstash.licensechecker.licensereader] Unable to retrieve license information from license server {:message=>"Host name '10.xxx.xxx.xx3' does not match the certificate subject provided by the peer (CN=instance)"}

I am hoping someone can help me figure out what to look at next for debugging.

It looks like the certificate it is using has "instance" as its hostname, but I don't know why that would be, since I specified --dns when creating it?

For example:

$ sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --pem -ca /usr/share/elasticsearch/elk-stack-ca.p12 --dns elk1.company.local
...
Enter password for CA (/usr/share/elasticsearch/elk-stack-ca.p12) :
Please enter the desired output file [certificate-bundle.zip]: elk1-test.zip

Certificates written to /usr/share/elasticsearch/elk1-test.zip
...
$ sudo unzip /usr/share/elasticsearch/elkdevn1-test.zip
Archive:  /usr/share/elasticsearch/elkdevn1-test.zip
   creating: instance/
  inflating: instance/instance.crt
  inflating: instance/instance.key
$ sudo openssl x509 -noout -subject -in instance/instance.crt
subject= /CN=instance

I feel like I must be missing something fundamental here.

I set

xpack.monitoring.elasticsearch.sniffing: false

And it started working.

The subject of a certificate is not its hostname. The hostnames are in the "Subject Alternative Names" extension.

Try

openssl x509 -in instance/instance.crt -text -noout  \
    | awk '/X509v3 Subject Alternative Name/,/DNS:/'

Okay, I am now running into an issue with communication between filebeat and logstash.

Here is my logstash main.conf:

input {
  beats {
    port => 5045
    ssl => true
    ssl_certificate_authorities => ["/etc/logstash/logstash-ca.crt"]
    ssl_certificate => "/etc/logstash/logstash-instance.crt"
    ssl_key => "/etc/logstash/logstash.pkcs8.key"
  }
}
output {
  if [fields][log_for] {
    elasticsearch {
      ssl => true
      ssl_certificate_verification => true
      cacert => '/etc/logstash/logstash.pem'
      hosts => [ "elk1:9200", "elk2:9200", "elk3:9200" ]
      user => "logstash_local"
      password => "REDACTED"
      index => "logstash-%{[fields][log_for]}-%{+YYYY.MM.dd}"
    }
  }
}

And my filebeat.yml:

filebeat.config.inputs: {enabled: true, path: /etc/filebeat/inputs.d/*.yml}
output.logstash:
  hosts: ['elk1:5045', 'elk2:5045', 'elk3:5045']
  ssl.certificate: /etc/filebeat/logstash.crt
  ssl.certificate_authorities: [/etc/filebeat/logstash-ca.crt]
  ssl.key: /etc/filebeat/logstash.pkcs8.key

The error I am seeing on filebeat side is:

Failed to connect to failover(backoff(async(tcp://elk1:5045)),backoff(async(tcp://elk2:5045)),backoff(async(tcp://elk3:5045))) ...

And on the logstash side:

[2021-10-18T15:28:48,012][WARN ][io.netty.channel.DefaultChannelPipeline][main][ba44fe2850f9f454c6669cd64ffc834c0554d0d94ae9276ead9a5d3f1a16c399] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.49.Final.jar:4.1.49.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate
        at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:1.8.0_302]
        at sun.security.ssl.Alert.createSSLException(Alert.java:117) ~[?:1.8.0_302]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:311) ~[?:1.8.0_302]
        at sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:293) ~[?:1.8.0_302]
        at sun.security.ssl.TransportContext.dispatch(TransportContext.java:185) ~[?:1.8.0_302]
        at sun.security.ssl.SSLTransport.decode(SSLTransport.java:152) ~[?:1.8.0_302]
        at sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:575) ~[?:1.8.0_302]
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:531) ~[?:1.8.0_302]
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:398) ~[?:1.8.0_302]
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:377) ~[?:1.8.0_302]
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626) ~[?:1.8.0_302]
        at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:282) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1372) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) ~[netty-all-4.1.49.Final.jar:4.1.49.Final]
        ... 17 more

It looks like it is complaining that the certificate being presented by filebeat is invalid, but it might be complaining about one of the certificate files configured in the logstash input section?

I tried debugging by manually running this on one of the filebeat servers:

$ curl -v --cacert logstash-ca.crt https://elk1:5045

And got this output:

* About to connect() to elk1 port 5045 (#0)
*   Trying 10.xxx.xxx.xx1...
* Connected to elk1 (10.xxx.xxx.xx1) port 5045 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: logstash-ca.crt
  CApath: none
* Server certificate:
*       subject: CN=instance
*       start date: Oct 15 20:35:44 2021 GMT
*       expire date: Oct 14 20:35:44 2024 GMT
*       common name: instance
*       issuer: CN=Elastic Certificate Tool Autogenerated CA
* NSS error -8182 (SEC_ERROR_BAD_SIGNATURE)
* Peer's certificate has an invalid signature.
* Closing connection 0
curl: (60) Peer's certificate has an invalid signature.

This is how the various certificate files were generated:

openssl pkcs12 -in elk-certificates.p12 -nocerts -nodes | sed -ne '/-BEGIN PRIVATE KEY-/,/-END PRIVATE KEY-/p' > logstash-ca.key

openssl pkcs12 -in elk-certificates.p12 -cacerts -nokeys -chain | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > logstash-ca.crt

openssl pkcs12 -in elk-certificates.p12 -clcerts -nokeys | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > logstash.crt

/usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca-cert logstash-ca.crt --ca-key logstash-ca.key --dns elk1,elk2,elk3 --pem
(instance.crt from the resulting zip renamed to logstash-instance.crt)

openssl pkcs8 -in logstash.key -topk8 -nocrypt -out logstash.pkcs8.key

I don't know how to narrow it down from here.

In SSL terminology, an alert is a message from the other side of the connection telling you that something is wrong. An a fatal alert is an alert where the thing that is wrong is so bad that the other side of the connection refuses to continue.

In this case, if Logstash is printing out:

Received fatal alert: bad_certificate

Then it means that the other side of the connection (presumably filebeat) doesn't trust Logstash's certificate and is closing the connection.

Assuming Logstash is correctly configured with the certificate that you intend it to use (which it probably is, but that's not something I can tell), then the only way you can fix this problem is on the filebeat side - Logstash cannot force filebeat to trust it, that's a decision that filebeat has to make.

There's likely to be 2 issues here:

  1. It looks like /etc/filebeat/logstash-ca.crt isn't the real CA certificate for Logstash. From your naming scheme, it looks like it should be, but based on the error message you reported it doesn't look like it is. You could try using openssl verify to debug that.
  2. You don't want to give a copy of your logstash.pkcs8.key to filebeat. That's the private key for Logstash. It's private - you don't just hand out copies to other processes. There is no need for filebeat to have ssl.certificate and ssl.key, you can just remove them from your config.

I converted the stack CA to .crt using this command:

$ sudo openssl pkcs12 -in /usr/share/elasticsearch/elk-stack-ca.p12 -clcerts -nokeys -out /usr/share/elasticsearch/elk-stack-ca.crt

Updated logstash's main.conf:

input {
  beats {
    port => 5045
    ssl => true
    ssl_certificate_authorities => ["/etc/logstash/elk-stack-ca.crt"]
    ssl_certificate => "/etc/logstash/logstash-instance.crt"
    ssl_key => "/etc/logstash/logstash.pkcs8.key"
  }
}
output {
  if [fields][log_for] {
    elasticsearch {
      ssl => true
      ssl_certificate_verification => true
      cacert => '/etc/logstash/logstash.pem'
      hosts => [ "elk1:9200", "elk2:9200", "elk3:9200" ]
      user => "logstash_local"
      password => "REDACTED"
      index => "logstash-%{[fields][log_for]}-%{+YYYY.MM.dd}"
    }
  }
}

Updated filebeat.yml, and removed the other cert fields (the guides I read showed those fields being populated, though):

filebeat.config.inputs: {enabled: true, path: /etc/filebeat/inputs.d/*.yml}
output.logstash:
  hosts: ['elk1:5045', 'elk2:5045', 'elk3:5045']
  ssl.certificate_authorities: [/etc/filebeat/elk-stack-ca.crt]

I'm still getting the same "Received fatal alert: bad_certificate" in the logstash log.

Is there a good step-by-step guide somewhere detailing how to set this up?
The best one I've found so far is this one (the one I have been trying to follow):

But it apparently has issues, as well.

I'm willing to do this all over again if there is a decent step-by-step somewhere.

I've been working through the log post linked above one more time, and got to this command:

openssl pkcs8 -in logstash.key -topk8 -nocrypt -out logstash.pkcs8.key

And realized that the filename logstash.key had not been mentioned anywhere previously in the post. I wonder if this is where I got tripped up. I think I just put logstash-ca.key here, assuming it was a typo, but don't know whether that is correct.

I am attempting to use this guide instead, now:

So now I have Elasticsearch.yml like this:

cluster.name: elkdev
node.name: elk1
network.host: 0.0.0.0
http.port: 9200
node.roles: [ master, data ]
thread_pool.write.queue_size: 3000
discovery.seed_hosts: ["elk1", "elk2", "elk3"]
cluster.initial_master_nodes: ["elk1", "elk2", "elk3"]
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
xpack.security.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.key: /etc/elasticsearch/elk1.key
xpack.security.http.ssl.certificate: /etc/elasticsearch/elk1.crt
xpack.security.http.ssl.certificate_authorities: /etc/elasticsearch/ca.crt
xpack.security.transport.ssl.key: /etc/elasticsearch/elk1.key
xpack.security.transport.ssl.certificate: /etc/elasticsearch/elk1.crt
xpack.security.transport.ssl.certificate_authorities: /etc/elasticsearch/ca.crt

But it seems like now the cluster isn't starting up...I've gone backward. :frowning:

Seeing warnings like this in the log:

[2021-10-19T15:54:42,979][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [elk1] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.xxx.xxx.xx1:9300, remoteAddress=/10.xxx.xxx.xx3:44262}
[2021-10-19T15:54:43,409][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [elk1] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.xxx.xxx.xx1:9300, remoteAddress=/10.xxx.xxx.xx2:35484}

As well as a bunch of these:

[2021-10-19T15:54:42,977][WARN ][o.e.t.TcpTransport       ] [elk1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.xxx.xxx.xx1:9300, remoteAddress=/10.xxx.xxx.xx3:44291}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
...

So frustrating.

I stopped all service except Elasticsearch on the elk servers, and found some more error messages in the log generated just by the Elasticsearch nodes trying to talk to each other:

[2021-10-19T16:45:39,380][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [elk1] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.xxx.xxx.xx1:9300, remoteAddress=/10.xxx.xxx.xx2:39466}
[2021-10-19T16:45:39,380][WARN ][o.e.c.s.DiagnosticTrustManager] [elk1] failed to establish trust with server at [10.xxx.xxx.xx2]; the server provided a certificate with subject name [CN=elk2] and fingerprint [44d9a3f1607460b3aa9ac046ecb502cf99593a77]; the certificate has subject alternative names [DNS:elk2]; the certificate is issued by [CN=Elastic Certificate Tool Autogenerated CA] but the server did not provide a copy of the issuing certificate in the certificate chain; the issuing certificate with fingerprint [8f38143729878cad57aed57655db5e58b96cf076] is trusted in this ssl context ([xpack.security.transport.ssl])
java.security.cert.CertificateException: No subject alternative names matching IP address 10.xxx.xxx.xx2 found
[2021-10-20T08:49:40,742][WARN ][o.e.d.HandshakingTransportAddressConnector] [elk1] [connectToRemoteMasterNode[10.xxx.xxx.xx2:9300]] completed handshake with [{elk2}{z0BPC1qZS0218vDUDIFpHg}{L9D1F-28Sj6parXS_Si14w}{10.xxx.xxx.xx2}{10.xxx.xxx.xx2:9300}{dm}{xpack.installed=true, transform.node=false}] but followup connection failed
org.elasticsearch.transport.ConnectTransportException: [elk2][10.xxx.xxx.xx2:9300] general node connection failure
...

If I test using curl:

curl --cacert ca.crt 'https://elastic:REDACTED@elk1:9200/_cat/nodes?v'

I don't see a certificate issue, but the user validation fails (I assume because the cluster is not actually running?)

Although, I suppose this might just mean that the http is working, and there is an issue with the transport layer, since the communication between the nodes isn't using http.

FWIW, if I curl to port 9300:

curl --cacert ca.crt 'https://elastic:REDACTED@elk1:9300/_cat/nodes?v'

I get:

curl: (58) NSS: client certificate not found (nickname not specified)

Although, if I specify client certificate (duh), I get what I would expect connecting to a binary port with curl:

$ curl --cacert ca.crt --cert elk1.crt --key elk1.key 'https://elastic:REDACTED@elk1:9300/_cat/nodes?v'
This is not an HTTP port

I added these to Elasticsearch.yml, and now the cluster is coming up. I have no idea why the default (full) does not work.

xpack.security.transport.ssl.verification_mode: certificate
xpack.security.http.ssl.verification_mode: certificate

Next issue, logstash isn't connecting to the cluster.

Here is my logstash.yml:

path.data: /var/lib/logstash
pipeline.ordered: auto
path.logs: /var/log/logstash
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.username: logstash_local
xpack.monitoring.elasticsearch.password: REDACTED
xpack.monitoring.elasticsearch.hosts: [ "https://elk1:9200", "https://elk2:9200", "https://elk3:9200" ]
xpack.monitoring.elasticsearch.ssl.certificate_authority: /etc/logstash/ca.crt
xpack.monitoring.elasticsearch.sniffing: false
xpack.monitoring.collection.interval: 10s
xpack.monitoring.collection.pipeline.details.enabled: true

And main.conf:

input {
  beats {
    port => 5045
    ssl => true
    ssl_certificate_authorities => ["/etc/logstash/ca.crt"]
    ssl_certificate => "/etc/logstash/elk1.crt"
    ssl_key => "/etc/logstash/elk1.pkcs8.key"
  }
}
output {
  if [fields][log_for] {
    elasticsearch {
      ssl => true
      ssl_certificate_verification => true
      cacert => '/etc/logstash/ca.crt'
      hosts => [ "elk1:9200", "elk2:9200", "elk3:9200" ]
      user => "logstash_local"
      password => "REDACTED"
      index => "logstash-%{[fields][log_for]}-%{+YYYY.MM.dd}"
    }
  }
}

I am seeing this in the logstash log:

[2021-10-20T10:10:34,857][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.9.0", "jruby.version"=>"jruby 9.2.12.0 (2.5.7) 2020-07-01 db01a49ba6 OpenJDK 64-Bit Server VM 25.302-b08 on 1.8.0_302-b08 +indy +jit [linux-x86_64]"}
[2021-10-20T10:10:37,441][INFO ][logstash.monitoring.internalpipelinesource] Monitoring License OK
[2021-10-20T10:10:37,445][INFO ][logstash.monitoring.internalpipelinesource] Validated license for monitoring. Enabling monitoring pipeline.
[2021-10-20T10:10:39,644][INFO ][org.reflections.Reflections] Reflections took 564 ms to scan 1 urls, producing 22 keys and 45 values
[2021-10-20T10:10:39,905][WARN ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] ** WARNING ** Detected UNSAFE options in elasticsearch output configuration!
** WARNING ** You have enabled encryption but DISABLED certificate verification.
** WARNING ** To make sure your data is secure change :ssl_certificate_verification to true
[2021-10-20T10:10:39,967][INFO ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://logstash_local:xxxxxx@elk1:9200/, https://logstash_local:xxxxxx@elk2:9200/, https://logstash_local:xxxxxx@elk3:9200/]}}
[2021-10-20T10:10:39,967][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://logstash_local:xxxxxx@elk1:9200/, https://logstash_local:xxxxxx@elk2:9200/, https://logstash_local:xxxxxx@elk3:9200/]}}
[2021-10-20T10:10:40,033][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"https://logstash_local:xxxxxx@elk1:9200/"}
[2021-10-20T10:10:40,042][INFO ][logstash.outputs.elasticsearch][main] ES Output version determined {:es_version=>7}
[2021-10-20T10:10:40,045][WARN ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] Restored connection to ES instance {:url=>"https://logstash_local:xxxxxx@elk1:9200/"}
[2021-10-20T10:10:40,047][WARN ][logstash.outputs.elasticsearch][main] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2021-10-20T10:10:40,051][INFO ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] ES Output version determined {:es_version=>7}
[2021-10-20T10:10:40,053][WARN ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2021-10-20T10:10:40,126][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"https://logstash_local:xxxxxx@elk2:9200/"}
[2021-10-20T10:10:40,127][WARN ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] Restored connection to ES instance {:url=>"https://logstash_local:xxxxxx@elk2:9200/"}
[2021-10-20T10:10:40,197][WARN ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] Restored connection to ES instance {:url=>"https://logstash_local:xxxxxx@elk3:9200/"}
[2021-10-20T10:10:40,204][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"https://logstash_local:xxxxxx@elk3:9200/"}
[2021-10-20T10:10:40,250][INFO ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearchMonitoring", :hosts=>["https://elk1:9200", "https://elk2:9200", "https://elk3:9200"]}
[2021-10-20T10:10:40,251][INFO ][logstash.outputs.elasticsearch][main] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//elk1:9200", "//elk2:9200", "//elk3:9200"]}
[2021-10-20T10:10:40,268][WARN ][logstash.javapipeline    ][.monitoring-logstash] 'pipeline.ordered' is enabled and is likely less efficient, consider disabling if preserving event order is not necessary
[2021-10-20T10:10:40,306][INFO ][logstash.outputs.elasticsearch][main] Using a default mapping template {:es_version=>7, :ecs_compatibility=>:disabled}
[2021-10-20T10:10:40,378][INFO ][logstash.outputs.elasticsearch][main] Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}
[2021-10-20T10:10:40,386][INFO ][logstash.javapipeline    ][.monitoring-logstash] Starting pipeline {:pipeline_id=>".monitoring-logstash", "pipeline.workers"=>1, "pipeline.batch.size"=>2, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>2, "pipeline.sources"=>["monitoring pipeline"], :thread=>"#<Thread:0x540ee26 run>"}
[2021-10-20T10:10:40,391][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>20, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>2500, "pipeline.sources"=>["/etc/logstash/conf.d/main.conf"], :thread=>"#<Thread:0x353dc422@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:122 run>"}
[2021-10-20T10:10:41,331][INFO ][logstash.javapipeline    ][.monitoring-logstash] Pipeline Java execution initialization time {"seconds"=>0.93}
[2021-10-20T10:10:41,373][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>0.98}
[2021-10-20T10:10:41,405][INFO ][logstash.inputs.beats    ][main] Beats inputs: Starting input listener {:address=>"0.0.0.0:5045"}
[2021-10-20T10:10:41,430][INFO ][logstash.javapipeline    ][.monitoring-logstash] Pipeline started {"pipeline.id"=>".monitoring-logstash"}
[2021-10-20T10:10:42,279][ERROR][logstash.agent           ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}
[2021-10-20T10:10:42,554][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2021-10-20T10:10:43,448][INFO ][logstash.javapipeline    ] Pipeline terminated {"pipeline.id"=>".monitoring-logstash"}
[2021-10-20T10:10:43,520][INFO ][logstash.runner          ] Logstash shut down.

Side question: What is the "You have enabled encryption but DISABLED certificate verification." warning about? I clearly have ssl_certificate_verification set to true in main.conf.

I set log.level to debug and checked the log again. Found more useful information:

File does not contain valid private key: /etc/logstash/elk1.pkcs8.key

Turns out that in my deployment script, the "node.key" file was erroneously being copied to "node.key" and "node.pkcs8.key". I fixed that and now logstash is running and successfully talking to Elasticsearch.

Three out of four components down...but filebeat is now failing to connect to logstash!

Here is my filebeat.yml:

filebeat.config.inputs: {enabled: true, path: /etc/filebeat/inputs.d/*.yml}
output.logstash:
  hosts: ['elk1:5045', 'elk2:5045', 'elk3:5045']
  ssl.certificate_authorities: [/etc/filebeat/ca.crt]

Seeing this error in the filebeat output:

Oct 20 13:38:29 devserver filebeat[44154]: 2021-10-20T13:38:29.490-0500 ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to failover(backoff(async(tcp://elk1:5045)),backoff(async(tcp://elk2:5045)),backoff(async(tcp://elk3.dev.oati.local:5045))): remote error: tls: bad certificate

If I test via the following curl command:

curl -v --cacert /etc/filebeat/ca.crt https://elk1:5045

I get:

* About to connect() to elk1 port 5045 (#0)
*   Trying 10.xxx.xxx.xx1...
* Connected to elk1 (10.xxx.xxx.xx1) port 5045 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/filebeat/ca.crt
  CApath: none
* NSS: client certificate not found (nickname not specified)
* NSS error -12271 (SSL_ERROR_BAD_CERT_ALERT)
* SSL peer cannot verify your certificate.
* Closing connection 0
curl: (58) NSS: client certificate not found (nickname not specified)

So now I'm basically back to where I was before starting over again. :frowning:

This seems like logstash is requiring a client certificate from filebeat, but all of the guides I'm seeing don't have anything about this.

I removed ssl_certificate_authorities from the beats input part of the logstash config, and now it works.

The next step is to use our actual CA to generate the certificates and add client cert validation for the filebeat->logstash communication, but that will be a task for later. :slight_smile:

A lot to digest, however the problem always seems to some back to the actual certificates you have created and using and not the ELK stack or beats.

This indicates the certificate issue.

Why are you obtaining the certificates from ? A Central Authority within your organisation or external ?

The certs were generated by an auto-generated Elasticsearch CA.

It's actually working now...I edited the last comment, maybe after you read it. The issue was that I don't have a client cert for filebeat, but having the ssl_certificate_authorities set in the logstash beats input section meant a filebeat client cert was required.

For now I removed that, but eventually I need to generate all of these certs from our corporate CA, including filebeat client certs. Everything is being deployed via Ansible, and I think we have some automation set up for cert requests, but I need to get some guidance from another dev who is on holiday at the moment.

Anyway, thanks to those who responded above. Just the act of working through it in the forum forced me to figure it out. :slight_smile: