ElasticsearchSecurityException when security is enabled on master node but not the data nodes

I have an ES 7.16.2 cluster running with dedicated master nodes and separate data nodes.
If/when -

  1. xpack.security.enabled is set to true on the master nodes
  2. some of the data nodes have xpack security disabled
  3. anonymous access is enabled on all the nodes with the anonymous role having all access to index / cluster operations -
anonymous_role:
  cluster: [ 'all' ]
  indices:
    - names: [ '*' ]
      privileges: [ 'all' ]

and

xpack.security.authc.anonymous.roles: anonymous_role
xpack.security.authc.anonymous.authz_exception: true

then I see these errors logs on the data nodes -

[<timestamp>] [WARN][o.e.c.a.s.ShardStateAction] [<data-node-host-name>] unexpected failure while sending request [internal:cluster/shard/failure] to [{<master-node-host-name}{hash}{hash}{IP}{IP:Port}{m}{zone=z, xpack.installed=true, transform.node=false}] for shard entry [shard id [[xyz][1]], allocation id [hash], primary term [0], message [shard failure, reason [primary shard [[xyz][1], node[hash], [P], s[STARTED], a[id=hash]] was demoted while failing replica shard]], failure [ElasticsearchSecurityException[action [internal:cluster/shard/failure] is unauthorized for user [_anonymous] with roles [anonymous_role]]], markAsStale [true]]
Caused by: org.elasticsearch.ElasticsearchSecurityException: action [internal:cluster/shard/failure] is unauthorized for user [_anonymous] with roles [anonymous_role]
	at org.elasticsearch.xpack.core.security.support.Exceptions.authorizationError(Exceptions.java:34) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.denialException(AuthorizationService.java:928) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.denialException(AuthorizationService.java:870) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeAction(AuthorizationService.java:471) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.maybeAuthorizeRunAs(AuthorizationService.java:371) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorize$1(AuthorizationService.java:256) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31) ~[elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.lambda$resolveAuthorizationInfo$1(RBACEngine.java:138) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.roles(CompositeRolesStore.java:173) ~[?:?]
	at org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRoles(CompositeRolesStore.java:279) ~[?:?]
	at org.elasticsearch.xpack.security.authz.RBACEngine.getRoles(RBACEngine.java:144) ~[?:?]
	at org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizationInfo(RBACEngine.java:127) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:258) ~[?:?]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$inbound$1(ServerTransportFilter.java:136) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:101) ~[elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.xpack.security.authc.AuthenticatorChain.authenticateAsync(AuthenticatorChain.java:102) ~[?:?]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:199) ~[?:?]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:128) ~[?:?]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:415) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:67) ~[elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:249) [elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:106) [elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:88) [elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:743) [elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:147) [elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:119) [elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:84) [elasticsearch-7.16.2.jar:7.16.2]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:71) [transport-netty4-client-7.16.2.jar:7.16.2]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1374) [netty-handler-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1237) [netty-handler-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1286) [netty-handler-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) [netty-codec-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) [netty-codec-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.66.Final.jar:4.1.66.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]

and also these -

[<timestamp>] [WARN][o.e.i.e.Engine] [<data-node-host-name>] [xyz][1] failed engine [primary shard [[xyz][1], node[hash], [P], s[STARTED], a[id=hash]] was demoted while failing replica shard]
[<timestamp>] org.elasticsearch.ElasticsearchSecurityException: action [internal:cluster/shard/failure] is unauthorized for user [_anonymous] with roles [anonymous_role]

On the server nodes, I see these errors -

[<timestamp>][WARN][o.e.c.r.a.AllocationService] [<master-node-host-name>] failing shard [failed shard, shard [xyz][1], node[hash], [P], s[STARTED], a[id=hash], message [master {<master-node-host-name>}{hash}{hash}{IP}{IP:Port}{m}{zone=z, xpack.installed=true, transform.node=false} has not removed previously failed shard. resending shard failure], failure [null], markAsStale [true]]
[<timestamp>][WARN][o.e.c.r.a.AllocationService] [<master-node-host-name>] [index-name][507] marking unavailable shards as stale: [hash]
[<timestamp>][WARN][o.e.x.s.a.AuthorizationService] [<master-node-host-name>] denying access as action [internal:cluster/shard/failure] is not an index or cluster action

I have these questions -

  1. Why is the anonymous user is assumed for inter-node communication? shouldn't that be used only for http access?
  2. Why is the all privilege on indices & cluster not enough for an action? What other privilege would allow that action [internal:cluster/shard/failure] that was being denied for the anonymous role? I couldn't find anything related in the elastic docs.

You cannot have security enabled on only some nodes, it has to be enabled on every node in the cluster.

The errors you are seeing are a consequence of trying to run in an unsupported configuration.

I know we cannot have security enabled on only some nodes, but that will be the case when I am enabling security (for the first time) in a live cluster by restarting nodes one by one?

Is it safe to say that if I enable security on all the data nodes first, and then enable it on master nodes then I won't see these errors? or is it all because of some config that I'm using incorrectly?

The errors you are seeing are a consequence of trying to run in an unsupported configuration.

Can you please elaborate a bit? What unsupported configuration can cause these ElasticsearchSecurityExceptions?

Enabling security requires a full cluster restart (it has to be applied to all nodes at the same time) and can not be done through a rolling upgrade.

1 Like

The unsupported configuration is having some nodes with security enable and some nodes without security enabled, this will not work.

When you enable or disable security you need to do a full cluster restarted, the nodes with a different configuration will stop working until you restart them to use the updated configuration, it doesn't matter the order.

A master node with security enabled will not allow any data node without security enabled to connect, and a data node with security enabled will not connect to a master node without security, so a full cluster restart is required.

2 Likes

I had missed the point that we need a full cluster restart to enable security.

I have a cluster that has security enabled on all the nodes now (although it was not enabled with a full cluster restart - rather randomly enabled it one by one across all the nodes). But I still got these errors for some bulk requests after the full cluster had security enabled and in place -

"error":{"root_cause":[{"type":"security_exception","reason":"action [indices:data/write/bulk] is unauthorized for user [_anonymous] with roles [<anonymous_role>], this action is granted by the index privileges [create_doc,create,delete,index,write,all]"}],"status":403}

Is this also because I had enabled the security without doing a full cluster restart? [All the nodes have xpack.security.enabled: true now and anonymous role has access to all the index / cluster operations]

What could have caused these errors now? Also, this doesn't happen on every bulk request. This happened for a very few requests.

I checked the logs and found that these errors were due to an internal issue that my configuration management system had caused, so ignore my last response. I'm assuming that everything will work fine as long as all the nodes in the cluster have security enabled and I won't see these errors again. Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.