Unable to upgrade ECK since v2.7.0

Hi,

I tried to upgrade ECK from version 2.7.0. Unfortunately, ECK version 2.8.0 and up are unable to manage my Elastic cluster. I upgraded Elastic a couple of time with ECK version 2.7.0 and it actually runs version 8.9.2.
If I upgrade ECK to a new version, a node is removed an a new one is provisioned. Then, it completely stops and I can see the following error in ECK logs:

{"log.level":"info","@timestamp":"2023-11-22T15:05:38.659Z","log.logger":"elasticsearch-controller","message":"Elasticsearch cannot be reached yet, re-queuing","service.version":"2.10.0+59c1e727","service.type":"eck","ecs.version":"1.4.0","iteration":"1242","namespace":"elasticsearch","es_name":"es-logging","namespace":"elasticsearch","es_name":"es-logging"}

On the Elastic side, I have another error as follows:

{"@timestamp":"2023-11-22T15:15:20.320Z", "log.level": "WARN", "message":"caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/10.42.87.108:9200, remoteAddress=/10.42.124.19:50414}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es-logging-es-logging-2][transport_worker][T#1]","log.logger":"org.elasticsearch.http.AbstractHttpServerTransport","elasticsearch.cluster.uuid":"REDACTED","elasticsearch.node.id":"REDACTED","elasticsearch.node.name":"es-logging-es-logging-2","elasticsearch.cluster.name":"es-logging","error.type":"io.netty.handler.codec.DecoderException","error.message":"javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate","error.stack_trace":"io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499)\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1623)\nCaused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate\n\tat java.base/sun.security.ssl.Alert.createSSLException(Alert.java:130)\n\tat java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:365)\n\tat java.base/sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:287)\n\tat java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:204)\n\tat java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)\n\tat java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736)\n\tat java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691)\n\tat java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506)\n\tat java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482)\n\tat java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679)\n\tat io.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:297)\n\tat io.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1353)\n\tat io.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1246)\n\tat io.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1295)\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)\n\t... 16 more\n"}

My cluster definition is the following:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: es-logging
  namespace: elasticsearch
spec:
  http:
    service:
      spec:
        type: LoadBalancer
    tls:
      certificate:
        secretName: elastic-cert
  nodeSets:
    - config:
        node.roles:
          - master
          - data
          - ingest
          - ml
          - transform
          - remote_cluster_client
      count: 3
      name: logging
      podTemplate:
        spec:
          initContainers:
            - command:
                - sh
                - '-c'
                - sysctl -w vm.max_map_count=262144
              name: sysctl
              securityContext:
                privileged: true
                runAsUser: 0
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 3Gi
  updateStrategy:
    changeBudget:
      maxSurge: 3
      maxUnavailable: 1
  version: 8.9.2
  volumeClaimDeletePolicy: DeleteOnScaledownOnly

I tried to do another cluster without custom cert (Removed spec.http.tls.certificate.secretName property) and ECK can manage it without any trouble.

The cert in use is a wildcard cert. I also tried with a dedicated cert (not working either), with SAN for the following domains:

  • elastic.domain.ext
  • es-logging-es-http.elasticsearch.es.local
  • es-logging-es-http
  • es-logging-es-http.elasticsearch
  • es-logging-es-http.elasticsearch.svc
  • es-logging-es-internal-http.elasticsearch
  • es-logging-es-internal-http.elasticsearch.svc
  • *.es-logging-es-logging.elasticsearch.svc
  • lb.exposed.ip.address

Am I missing something? How can I use a custom cert with a newer version of ECK?

Hello,
Has anyone encountered a similar problem since version 2.8.0 of ECK ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.