ELK with SSL, possible issues?

Hello all!

I am fairly new to the ELK stack and am trying to run a Proof of Concept of it. Getting it up and running under http was fairly easy, following the guides. However, as I want to have SSL enabled on all communications, I am now struggling quite a bit :slight_smile:

So long story short, here is my problem.
The host is a VM running Ubuntu 18.04 LTS. The elasticsearch log is getting spammed with the following warning (I know it is just a warning, but I don't like having it there).

[2020-01-23T08:22:34,260][WARN ][o.e.h.n.Netty4HttpServerTransport] [primo] caught exception while handling client http traffic, closing connection [id: 0xbab5aee4, L:0.0.0.0/0.0.0.0:9200 ! R:/127.0.0.1:38978]
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f5f636c75737465722f6865616c74682f5f616c6c3f6c6f63616c3d747275652674696d656f75743d36307320485454502f312e310d0a436f6e74656e742d4c656e6774683a20300d0a486f73743a203132372e302e302e313a393230300d0a436f6e6e656374696f6e3a204b6565702d416c6976650d0a557365722d4167656e743a204170616368652d48747470436c69656e742f342e352e3420284a6176612f312e382e305f323332290d0a4163636570742d456e636f64696e673a20677a69702c6465666c6174650d0a0d0a
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f5f636c75737465722f6865616c74682f5f616c6c3f6c6f63616c3d747275652674696d656f75743d36307320485454502f312e310d0a436f6e74656e742d4c656e6774683a20300d0a486f73743a203132372e302e302e313a393230300d0a436f6e6e656374696f6e3a204b6565702d416c6976650d0a557365722d4167656e743a204170616368652d48747470436c69656e742f342e352e3420284a6176612f312e382e305f323332290d0a4163636570742d456e636f64696e673a20677a69702c6465666c6174650d0a0d0a
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1182) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
        ... 15 more

The configuration on the elasticsearch.yml is as follows:

cluster.name: test
node.name: primo
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: "localhost"
http.port: 9200
discovery.zen.ping.unicast.hosts: ["localhost"]
discovery.zen.minimum_master_nodes: 1
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: false
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: /etc/elasticsearch/config/certs/key.key
xpack.security.transport.ssl.certificate: /etc/elasticsearch/config/certs/cert.pem
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.key:  /etc/elasticsearch/config/certs/key.key
xpack.security.http.ssl.certificate: /etc/elasticsearch/config/certs/cert.pem

The certificates are self-signed, thus I haven't specified the CA. Not sure if this would be an issue.
It is up and running, but as it is a single-node cluster the status is showing as yellow:

curl -XGET -k -u '<user>:<password>' 'https://localhost:9200/_cluster/health?pretty'
{
  "cluster_name" : "test",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 75,
  "active_shards" : 75,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 55,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 57.692307692307686
}

curl -XGET -k -u '<user>:<password>' 'https://localhost:9200/?pretty'
{
  "name" : "primo",
  "cluster_name" : "test",
  "cluster_uuid" : "9I9BmTr1SVaKV9GoSYSGwA",
  "version" : {
    "number" : "6.8.6",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "3d9f765",
    "build_date" : "2019-12-13T17:11:52.013738Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.2",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Based on my research over the last few days, this problem would usually be because of other tool/system trying to connect to the elasticsearch via http and not https. So this is why I stopped both Kibana and Logstash completely. At the moment the only things running on the machine is elasticsearch and nginx (pointing to the Kibana GUI). The logs are still showing up constantly.

Any pointers on why those are showing and where to look further will be of great help, as it is clearly this machine trying to talk to itself.

Also, I was not able to find a setting that would disable clustering for the elasticsearch, and this at the moment is my primary suspect.

That is correct. The only solution is for you to work out what is sending these requests and stop it. Here is the content of the request, if it helps:

GET /_cluster/health/_all?local=true&timeout=60s HTTP/1.1
Content-Length: 0
Host: 127.0.0.1:9200
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.4 (Java/1.8.0_232)
Accept-Encoding: gzip,deflate

So it's something that uses Apache-HttpClient/4.5.4 (Java/1.8.0_232) which is asking for the cluster health. The log message also tells you the address and port that the client uses: R:/127.0.0.1:38978. That means it's on the same machine. Maybe you can catch the culprit by using sudo netstat -antp to find the process id that was using the port. Note that the port may well change between requests, so you might have to keep trying until you find a match.

I don't think this is Elasticsearch, by the way. Elasticsearch doesn't talk to itself much over HTTP, and this cluster health API call is kinda strange (why say _all, and why use local=true?) Also I think we jumped from HttpClient 4.5.2 in 6.8 to 4.5.7 in 7.0 so there isn't a version of Elasticsearch that uses 4.5.4:

$ git diff 6.8..v7.0.0 -- buildSrc/version.properties | grep httpclient
-httpclient        = 4.5.2
+httpclient        = 4.5.7

Hello David,

Thank you very much for the prompt reply.

I knew I was dead stuck on this. Mentioning the apache-httpclient was a great refresher and I found the culprit - as this is a PoC machine, there was an instance of another service using elasticsearch. Disabled that one and all looks good at the moment. Now I can go back to configuring the .net core app to send the messages properly.

Again. Thanks a great deal.

Cheers!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.