Elasticsearch 8.7, "master_not_discovered_exception" error

Upgraded elasticsearch from 7.17 to 8.7. Elasticsearch service is running but not able to discover other nodes.

  1. Ran: curl -XGET "localhost:9200/_cluster/state?filter_path=version,nodes,metadata.cluster_coordination&pretty"

output:
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}

  1. In error logs:
[2023-04-06T08:40:56,693][WARN ][o.e.d.PeerFinder         ] [elkdata-0] address [x.x.x.x:9300], node [null], requesting [false] discovery result: [elkmaster-1][x.x.x.x:9300][internal:transport/handshake]: missing authentication credentials for action [internal:transport/handshake]
[2023-04-06T08:40:56,693][WARN ][o.e.d.PeerFinder         ] [elkdata-0] address [x.x.x.x:9300], node [null], requesting [false] discovery result: [elkmaster-0][x.x.x.x:9300][internal:transport/handshake]: missing authentication credentials for action [internal:transport/handshake]
[2023-04-06T08:40:56,695][WARN ][o.e.d.HandshakingTransportAddressConnector] [elkdata-0] handshake to [x.x.x.x:9300] failed
org.elasticsearch.transport.RemoteTransportException: [elkmaster-2][10.5.0.7:9300][internal:transport/handshake]
Caused by: org.elasticsearch.ElasticsearchSecurityException: missing authentication credentials for action [internal:transport/handshake]
        at org.elasticsearch.xpack.core.security.support.Exceptions.authenticationError(Exceptions.java:18) ~[?:?]
        at org.elasticsearch.xpack.core.security.authc.DefaultAuthenticationFailureHandler.createAuthenticationError(DefaultAuthenticationFailureHandler.java:175) ~[?:?]
        at org.elasticsearch.xpack.core.security.authc.DefaultAuthenticationFailureHandler.missingToken(DefaultAuthenticationFailureHandler.java:130) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticationService$AuditableTransportRequest.anonymousAccessDenied(AuthenticationService.java:338) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticatorChain.handleNullToken(AuthenticatorChain.java:320) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticatorChain.lambda$doAuthenticate$1(AuthenticatorChain.java:131) ~[?:?]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.xpack.core.common.IteratingActionListener.onResponse(IteratingActionListener.java:132) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticatorChain.lambda$getAuthenticatorConsumer$5(AuthenticatorChain.java:165) ~[?:?]
        at org.elasticsearch.xpack.core.common.IteratingActionListener.onResponse(IteratingActionListener.java:135) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticatorChain.lambda$getAuthenticatorConsumer$5(AuthenticatorChain.java:165) ~[?:?]
        at org.elasticsearch.xpack.core.common.IteratingActionListener.onResponse(IteratingActionListener.java:135) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticatorChain.lambda$getAuthenticatorConsumer$5(AuthenticatorChain.java:165) ~[?:?]
        at org.elasticsearch.xpack.core.common.IteratingActionListener.onResponse(IteratingActionListener.java:135) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticatorChain.lambda$getAuthenticatorConsumer$5(AuthenticatorChain.java:165) ~[?:?]
        at org.elasticsearch.xpack.core.common.IteratingActionListener.run(IteratingActionListener.java:117) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticatorChain.doAuthenticate(AuthenticatorChain.java:143) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticatorChain.authenticateAsync(AuthenticatorChain.java:104) ~[?:?]
        at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:199) ~[?:?]
        at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:128) ~[?:?]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:415) ~[?:?]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:67) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:261) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:109) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:88) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:743) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:147) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:119) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:84) ~[elasticsearch-8.7.0.jar:?]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:71) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:1589) ~[?:?]
[2023-04-06T08:40:56,696][WARN ][o.e.d.PeerFinder         ] [elkdata-0] address [x.x.x.x:9300], node [null], requesting [false] discovery result: [elkmaster-2][x.x.x.x:9300][internal:transport/handshake]: missing authentication credentials for action [internal:transport/handshake]

  1. Here is elasticsearch.yml
    cluster.name: "xxx"
    node.name: "elkdata-0"
    path.logs: /var/log/elasticsearch
    path.data: /data/elasticsearch/data
    discovery.seed_hosts: ["elkmaster-0","elkmaster-1","elkmaster-2"]
    cluster.initial_master_nodes: ["elkmaster-0","elkmaster-1","elkmaster-2"]
    node.roles: [data]
    network.host: [site, local]
    http.port: 9200
    node.attr.fault_domain: 1
    node.attr.update_domain: 1
    cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
    xpack.license.self_generated.type: trial
    xpack.security.enabled: false
    bootstrap.memory_lock: true

Also, did check if master nodes can reached from this serve using telnet and I am able to telnet all 3 master nodes.
Here is also a log from one of the master node:
[2023-04-06T05:11:56,171][WARN ][o.e.t.TcpTransport ] [elkmaster-0] invalid internal transport message format, got (ff,f4,ff,fd), [Netty4TcpChannel{localAddress=/x.x.x.x:9300, remoteAddress=/x.x.x.x:49400, profile=default}], closing connection

This is a very weird error message. Are you using binaries downloaded from elastic.co or did you build Elasticsearch yourself? Are you using any plugins?

1 Like

I am using binaries downloaded from elastic.co. Not using any plugins as of now.
I deployed the cluster from Azure Marketplace and had to upgrade it to 8.7. I successfully upgraded from 7.11 to 7.17 first and is facing this issue when upgrading from 7.17 to 8.7.

Also, from the entire cluster I started upgrading the data nodes first and on the first data node itself upgrade is failing. The other nodes are still at 7.17.

Very strange. I can see no way to hit this line ...

... if the node sees xpack.security.enabled: false in its config. I suspect you are not looking at the correct config fie.

Okay, when running command line, get this error:

ERROR: Skipping security auto configuration because it appears that the node is not starting up for the first time. The node might already be part of a cluster and this auto setup utility is designed to configure Security for new clusters only.

I verified the config file used and it is the correct one.
Could it be because of different version on data node I have upgraded and master nodes?
This is the version matrix as of now:
data-0 : 8.7
data-1 : 7.17
data-2 : 7.17
master-0 : 7.17
master-1 : 7.17
master-2 : 7.17

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.