ECK Elasticsearch is unavailable and stuck during upgrade process

Hello

I have installed the elasticsearch cluster using ECK with (master = 3, client=2, data=40)

config:
  indices.queries.cache.size: 20%

the cluster was in Ready and Green state.
Then I changed cache size to 21%,

  • After 4 minutes, 2 nodes restarted and then for 1 hour, Health = Unknown and Phase = ApplyingChanges.

I can access the cluster by port-forwarding and this is cluster status.

http://localhost:9200/_cluster/health

{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}

I have 3 master nodes. I am running in a GKE cluster. Can you please tell me what could be the possible reason for this.

Master nodeset config.

 nodeSets:
  - name: masters-zone-a
    count: 3
    config:
      node.master: true
      node.data: false
      node.ingest: false

      monitor.jvm.gc.overhead.warn: 75
      monitor.jvm.gc.overhead.info: 25
      monitor.jvm.gc.overhead.debug: 10
      bootstrap.memory_lock: false

      # avoid split-brain w/ a minimum consensus of two masters plus a data node
      gateway.expected_master_nodes: ${EXPECTED_MASTER_NODES:2}
      gateway.expected_data_nodes: ${EXPECTED_DATA_NODES:1}
      gateway.recover_after_time: ${RECOVER_AFTER_TIME:5m}
      gateway.recover_after_master_nodes: ${RECOVER_AFTER_MASTER_NODES:2}
      gateway.recover_after_data_nodes: ${RECOVER_AFTER_DATA_NODES:1}
      cluster:
        nodes:
          reconnect_interval: 5s
        routing:
          allocation:
            cluster_concurrent_rebalance: 16
            node_concurrent_incoming_recoveries: 16
            node_concurrent_outgoing_recoveries: 16
          use_adaptive_replica_selection: true
      indices:
        queries:
          cache:
            size: 20%
        recovery:
          max_bytes_per_sec: 2000mb
      search:
        default_search_timeout: 500ms
        low_level_cancellation: false
      transport:
        connect_timeout: 10s
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        volumeMode: Filesystem
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: '-Djava.net.preferIPv4Stack=true -Xms512m -Xmx512m  '
          - name: MINIMUM_MASTER_NODES
            value: "2"
          resources:
            requests:
              cpu: 200m
              memory: 3Gi
            limits:
              cpu: 500m
              memory: 3Gi
          volumeMounts:
          - mountPath: /usr/share/elasticsearch/config/jvm.options
            name: config
            subPath: jvm.options
          - mountPath: /usr/share/elasticsearch/config/log4j2.properties
            name: config
            subPath: log4j2.properties
        volumes:
        - configMap:
            name: elasticsearch-config
          name: config

Logs from master nodes are giving error failed to connect to node
Caused by: handeshake failed and caused by: missing authentication token for action

Hello!

Which version of ECK are you running?
Can you provide the entire Elasticsearch resource manifest? Which version of Elasticsearch are you running?
Can you provide the full logs from the master nodes?

You should probably not set MINIMUM_MASTER_NODES yourself, ECK already handles it "the right way" (which covers some corner cases).

ECK version 1.2.1
Elasticsearch version is 6.8.8

Elasticsearch manifest

spec:
  version: 6.8.8
  updateStrategy:
    changeBudget:
      maxSurge: 50
      maxUnavailable: 50
  secureSettings:
  - secretName: gcs-credentials
  http:
    service:
      metadata:
        annotations:
          cloud.google.com/load-balancer-type: "Internal"
      spec:
        type: LoadBalancer
        selector:
          elasticsearch.k8s.elastic.co/cluster-name: "elasticsearch-apple"
          elasticsearch.k8s.elastic.co/node-ingest: "true"
  nodeSets:
  - name: masters-zone-a
    count: 3
    config:
      node.master: true
      node.data: false
      node.ingest: false

      monitor.jvm.gc.overhead.warn: 75
      monitor.jvm.gc.overhead.info: 25
      monitor.jvm.gc.overhead.debug: 10

      bootstrap.memory_lock: false

      # see https://github.com/elastic/elasticsearch-definitive-guide/pull/679
      processors: 8

      # avoid split-brain w/ a minimum consensus of two masters plus a data node
      gateway.expected_master_nodes: ${EXPECTED_MASTER_NODES:2}
      gateway.expected_data_nodes: ${EXPECTED_DATA_NODES:1}
      gateway.recover_after_time: ${RECOVER_AFTER_TIME:5m}
      gateway.recover_after_master_nodes: ${RECOVER_AFTER_MASTER_NODES:2}
      gateway.recover_after_data_nodes: ${RECOVER_AFTER_DATA_NODES:1}
      cluster:
        nodes:
          reconnect_interval: 5s
        routing:
          allocation:
            cluster_concurrent_rebalance: 16
            node_concurrent_incoming_recoveries: 16
            node_concurrent_outgoing_recoveries: 16
          use_adaptive_replica_selection: true
      indices:
        queries:
          cache:
            size: 21%
        recovery:
          max_bytes_per_sec: 2000mb
      search:
        default_search_timeout: 500ms
        low_level_cancellation: false
      transport:
        connect_timeout: 10s
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        volumeMode: Filesystem
    podTemplate:
      spec:
        volumes:
        - configMap:
            name: elasticsearch-config
          name: config
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: elasticsearch-apple
                  elasticsearch.k8s.elastic.co/node-master: "true"
              topologyKey: kubernetes.io/hostname
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: '-Djava.net.preferIPv4Stack=true -Xms512m -Xmx512m  '
          - name: MINIMUM_MASTER_NODES
            value: "2"
          resources:
            requests:
              cpu: 200m
              memory: 3Gi
            limits:
              cpu: 500m
              memory: 3Gi
          volumeMounts:
          - mountPath: /usr/share/elasticsearch/config/jvm.options
            name: config
            subPath: jvm.options
          - mountPath: /usr/share/elasticsearch/config/log4j2.properties
            name: config
            subPath: log4j2.properties

I have other nodes also but I have not pasted the config because it is of same configuration.

Elasticsearch master logs

W0903 10:56:45.000766 1 [elasticsearch-es-masters-zone-a-1] failed to connect to node {elasticsearch-es-data-all-zone-a-21}{yWnQkfY7TkqAUIGTZb2LdQ}{KzuCWQvPRK2HGcUYQVmgiw}{10.4.191.2}{10.4.191.2:9300}{ml.machine_memory=63328395264, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true, group=all} (tried [1] times) org.elasticsearch.transport.ConnectTransportException: [elasticsearch-es-data-all-zone-a-21][10.4.191.2:9300] general node connection failure
 at org.elasticsearch.transport.ConnectionManager.connectToNode(ConnectionManager.java:127) ~[elasticsearch-6.7.1.jar:6.7.1]
 at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:342) ~[elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:329) ~[elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:154) [elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.cluster.NodeConnectionsService$1.doRun(NodeConnectionsService.java:107) [elasticsearch-6.7.1.jar:6.7.1]
 at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
 at java.lang.Thread.run(Thread.java:835) [?:?]
 Caused by: java.lang.IllegalStateException: handshake failed with {elasticsearch-es-data-all-zone-a-21}{yWnQkfY7TkqAUIGTZb2LdQ}{KzuCWQvPRK2HGcUYQVmgiw}{10.4.191.2}{10.4.191.2:9300}{ml.machine_memory=63328395264, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true, group=all}
  at org.elasticsearch.transport.TransportService.handshake(TransportService.java:417) ~[elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.transport.TransportService.lambda$connectionValidator$4(TransportService.java:348) ~[elasticsearch-6.7.1.jar:6.7.1]
 at org.elasticsearch.transport.ConnectionManager.connectToNode(ConnectionManager.java:105) ~[elasticsearch-6.7.1.jar:6.7.1]
  ... 9 more
  Caused by: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-all-zone-a-21][10.4.191.2:9300][internal:transport/handshake]
Caused by: org.elasticsearch.ElasticsearchSecurityException: missing authentication token for action [internal:transport/handshake]
 at org.elasticsearch.xpack.core.security.support.Exceptions.authenticationError(Exceptions.java:18) ~[?:?]
 at org.elasticsearch.xpack.core.security.authc.DefaultAuthenticationFailureHandler.createAuthenticationError(DefaultAuthenticationFailureHandler.java:163) ~[?:?]
  at org.elasticsearch.xpack.core.security.authc.DefaultAuthenticationFailureHandler.missingToken(DefaultAuthenticationFailureHandler.java:118) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$AuditableTransportRequest.anonymousAccessDenied(AuthenticationService.java:650) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$handleNullToken$19(AuthenticationService.java:466) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.handleNullToken(AuthenticationService.java:471) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.consumeToken(AuthenticationService.java:355) ~[?:?]
  at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$extractToken$9(AuthenticationService.java:326) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.extractToken(AuthenticationService.java:344) ~[?:?]
  at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$checkForApiKey$3(AuthenticationService.java:287) ~[?:?]
 at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-6.7.1.jar:6.7.1]
 at org.elasticsearch.xpack.security.authc.ApiKeyService.authenticateWithApiKeyIfPresent(ApiKeyService.java:345) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.checkForApiKey(AuthenticationService.java:268) ~[?:?]
  at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$0(AuthenticationService.java:251) ~[?:?]
  at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-6.7.1.jar:6.7.1]
 at org.elasticsearch.xpack.security.authc.TokenService.getAndValidateToken(TokenService.java:326) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:247) ~[?:?]
  at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:305) ~[?:?]
  at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:316) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:243) ~[?:?]
 at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:195) ~[?:?]
  at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:138) ~[?:?]
  at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:133) ~[?:?]
 at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:306) ~[?:?]
  at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) ~[elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192) ~[elasticsearch-6.7.1.jar:6.7.1]
 at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) ~[elasticsearch-6.7.1.jar:6.7.1]
 at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) ~[elasticsearch-6.7.1.jar:6.7.1]
  at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) ~[elasticsearch-6.7.1.jar:6.7.1]
 at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) ~[?:?]
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
 at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
 at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
 at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
 at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) ~[?:?]
 at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[?:?]
  at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[?:?]
 at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[?:?]
  at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[?:?]
  at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
 at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[?:?]
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[?:?]
 at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) ~[?:?]
  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
 at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
 at java.lang.Thread.run(Thread.java:835) ~[?:?]

Is it possible some of your nodes are configured with xpack security disabled? Or did you disable it then re-enable it at some point?
Can you try manually deleting the failing master Pods (they should be recreated automatically - with the right config specified in the ES spec)?
I see you are mounting your own jvm.options. Can you share its content?

jvm options

-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=75
-Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10
-XX:+AlwaysPreTouch
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-XX:-OmitStackTraceInFastThrow
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Djava.io.tmpdir=${ES_TMPDIR}
-XX:+HeapDumpOnOutOfMemoryError
-XX:ErrorFile=logs/hs_err_pid%p.log
9-:-Xlog:gc*,gc+age=trace,safepoint:file=${loggc}:utctime,pid,tags:filecount=32,filesize=64m
9-:-Djava.locale.providers=COMPAT

I shall reinstall the elasticsearch from scratch with default config of xpack security, as it is development environment and verify it again.

When I installed again, I saw logs like

2020-09-04T14:41:27.158Z	INFO	transport	No tls certificate found in secret	{"service.version": "1.2.1-b5316231", "namespace": "es-test", "pod_name": "elasticsearch-test-es-data-zone-a-9"}

I re-installed the elasticsearch with all default settings of xpack and logs from master when I upgrade some settings.

 caught exception while handling client http traffic, closing connection [id: 0xd768d6bb, L:0.0.0.0/0.0.0.0:9200 ! R:/10.4.109.1:37772] io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f20485454502f312e310d0a486f73743a2031302e342e3130392e31393a393230300d0a557365722d4167656e743a2044617461646f67204167656e742f372e32322e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a|
  at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]|
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]|
  at java.lang.Thread.run(Thread.java:835) [?:?]|
  Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f20485454502f312e310d0a486f73743a2031302e342e3130392e31393a393230300d0a557365722d4167656e743a2044617461646f67204167656e742f372e32322e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a|
   at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1182) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]|
  at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]|
   at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]|
   ... 15 more|
 

No tls certificate found in secret in the operator logs is not a problem, it's just a regular log when certificates are created for the first time.

The last log looks like something is trying to reach Elasticsearch using HTTP instead of HTTPS.

Sorry but I'm a bit confused with your setup. Especially if you modified xpack security settings.

How about you just create an Elasticsearch cluster by following the quickstart and we work from there: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html? Are you able to spin it up? Is a rolling upgrade successful if you just change one config setting (eg. node.attr.foo: bar)?

1 Like

I have not modified xpack settings.

I have found that I was using custom image which was built on v6.7.1 which is not supported by ECK. I shall try with latest version now.

I have found that I was using custom image which was built on v6.7.1

Hah, that would make sense. We only support Elasticsearch version 6.8+, since it has TLS support with the Basic license.