Hi I'm facing with ALLOCATION_FAILED, and I want to know what was the reason. My cluster consists with 51 node on 3 hosts under docker swarm, and I have also configured data tier.
Can You explain how I can recover through dev tool unnasigned shards. It seems that only replicated shards have a problem.
{
"note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
"index" : "logstash-ebm-sgu-srv40990kab-b12-2022.06.18",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2022-06-18T15:16:33.007Z",
"failed_allocation_attempts" : 5,
"details" : """failed shard on node [eRk6tLc3RFG0zYOsnFPFUw]: failed recovery, failure org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-ebm-sgu-srv40990kab-b12-2022.06.18][0]: Recovery failed from {es_data_ssd_5_1}{XMyLrsVFSJiKSs-yUhZkgA}{-gwnp-iZTYKaMlpgb6zBFQ}{10.0.9.102}{10.0.9.102:9300}{hs}{rack_id=rack_one, xpack.installed=true} into {es_data_ssd_4_3}{eRk6tLc3RFG0zYOsnFPFUw}{4nWdvQE6QnCP3XEAcVj--Q}{10.0.9.146}{10.0.9.146:9300}{hs}{xpack.installed=true, rack_id=rack_three}
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryResponseHandler.handleException(PeerRecoveryTargetService.java:816)
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1349)
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1349)
at org.elasticsearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:397)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:833)
Caused by: org.elasticsearch.transport.RemoteTransportException: [es_data_ssd_5_1][10.0.9.102:9300][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [internal:index/shard/recovery/start_recovery] would be [8513229092/7.9gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8513228144/7.9gb], new bytes reserved: [948/948b], usages [fielddata=2149465577/2gb, request=0/0b, inflight_requests=1298/1.2kb, model_inference=0/0b, eql_sequence=0/0b]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:440)
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:108)
at org.elasticsearch.transport.InboundAggregator.checkBreaker(InboundAggregator.java:215)
at org.elasticsearch.transport.InboundAggregator.finishAggregation(InboundAggregator.java:119)
at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:147)
at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:121)
at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:86)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1371)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1234)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1283)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:833)
""",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "awaiting_info",
"allocate_explanation" : "cannot allocate because information about existing shard data is still being retrieved from some of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "1YxZtfHETGqHHxfbVI3lHQ",
"node_name" : "es_data_ssd_1_2",
"transport_address" : "10.0.9.117:9300",
"node_attributes" : {
"rack_id" : "rack_two",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "8RZ_g_qIQjSW7g_jgA0LRA",
"node_name" : "es_data_ssd_3_3",
"transport_address" : "10.0.9.143:9300",
"node_attributes" : {
"rack_id" : "rack_three",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "NPcmoyidSr2r4GiO07uimw",
"node_name" : "es_data_ssd_4_2",
"transport_address" : "10.0.9.128:9300",
"node_attributes" : {
"rack_id" : "rack_two",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "WqnNe05MTuuYzY8uwI_P7Q",
"node_name" : "es_data_ssd_5_3",
"transport_address" : "10.0.9.132:9300",
"node_attributes" : {
"rack_id" : "rack_three",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "YEzEpdGpT7iECEXqVTVhDQ",
"node_name" : "es_data_ssd_2_3",
"transport_address" : "10.0.9.136:9300",
"node_attributes" : {
"rack_id" : "rack_three",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "azCWBxpOTnC9Tj5vZSw1yw",
"node_name" : "es_data_ssd_1_3",
"transport_address" : "10.0.9.140:9300",
"node_attributes" : {
"rack_id" : "rack_three",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "eRk6tLc3RFG0zYOsnFPFUw",
"node_name" : "es_data_ssd_4_3",
"transport_address" : "10.0.9.146:9300",
"node_attributes" : {
"rack_id" : "rack_three",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "fjWTi2hUQxO3tEzT5zwGog",
"node_name" : "es_data_ssd_3_2",
"transport_address" : "10.0.9.115:9300",
"node_attributes" : {
"rack_id" : "rack_two",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "ifoM86KLRxarQqBVnAoa4A",
"node_name" : "es_data_ssd_5_2",
"transport_address" : "10.0.9.119:9300",
"node_attributes" : {
"rack_id" : "rack_two",
"xpack.installed" : "true"
},
"node_decision" : "yes"
},
{
"node_id" : "wgCiK7OqTbG6XUPTtv-_gg",
"node_name" : "es_data_ssd_2_2",
"transport_address" : "10.0.9.118:9300",
"node_attributes" : {
"rack_id" : "rack_two",
"xpack.installed" : "true"
},