I have serval ES cluster that will do shards moving periodically(from hot
node to cold
node). Recently I notice that some of the shards are moved very slow compared to others. Here is one of the slow cluster:
$ curl -XGET "http://es-host.info:9201/_cat/recovery/athena_prod_log_2_20221224_20?v&h=i,s,t,ty,st,shost,thost,f,fp,b,bp,to,top"
i s t ty st shost thost f fp b bp to top
athena_prod_log_2_20221224_20 0 7.4h peer done 10.67.133.50 10.67.244.223 136 100.0% 19144292604 100.0% 1082795 100.0%
athena_prod_log_2_20221224_20 1 9.6h peer done 10.67.224.46 10.67.244.30 142 100.0% 19166214997 100.0% 1080489 100.0%
athena_prod_log_2_20221224_20 2 19ms empty_store done n/a 10.67.159.241 0 0.0% 0 0.0% 0 100.0%
athena_prod_log_2_20221224_20 2 15.3h peer translog 10.67.159.241 10.67.244.49 112 100.0% 19053587025 100.0% 1111107 40.6%
athena_prod_log_2_20221224_20 3 3.8h peer done 10.67.224.217 10.67.244.250 118 100.0% 19181647268 100.0% 1094604 100.0%
athena_prod_log_2_20221224_20 4 15.3h peer translog 10.67.159.151 10.67.244.47 148 100.0% 19079947851 100.0% 1058303 31.5%
athena_prod_log_2_20221224_20 4 69ms empty_store done n/a 10.67.159.151 0 0.0% 0 0.0% 0 100.0%
- I have set the
indices.recovery.max_bytes_per_sec
to40mb
, andcluster.routing.allocation.node_concurrent_recoveries
to4
.
Like shard 2 and 4, they stay in undone for 15+ hours, and the value of top(translog_ops_percentage) increase very slow.
I have go to thost
(10.67.244.49 and 10.67.159.151) to print the jstack info, and the only RUNNABLE generic thread like below:
# ssh 10.67.244.49 && jstack
"elasticsearch[elasticsearch-athena-prod-log-2-co-676803-g662b.hy][generic][T#3]" #44 daemon prio=5 os_prio=0 tid=0x00007f06e8002000 nid=0x3d6 runnable [0x00007f06cf9d9000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125)
at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:154)
at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:507)
at org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.lookupSeqNo(PerThreadIDVersionAndSeqNoLookup.java:167)
at org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.loadDocIdAndSeqNo(VersionsAndSeqNoResolver.java:164)
at org.elasticsearch.index.engine.InternalEngine.compareOpToLuceneDocBasedOnSeqNo(InternalEngine.java:728)
at org.elasticsearch.index.engine.InternalEngine.planIndexingAsNonPrimary(InternalEngine.java:1022)
at org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1041)
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:928)
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:826)
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:793)
at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1377)
at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1364)
at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$indexTranslogOperations$2(RecoveryTarget.java:363)
at org.elasticsearch.indices.recovery.RecoveryTarget$$Lambda$3352/2023333997.get(Unknown Source)
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:197)
at org.elasticsearch.indices.recovery.RecoveryTarget.indexTranslogOperations(RecoveryTarget.java:338)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:512)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:472)
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30)
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
# ssh 10.67.244.47 && jstack
"elasticsearch[elasticsearch-athena-prod-log-2-co-676803-8t8z9.hy][generic][T#3]" #41 daemon prio=5 os_prio=0 tid=0x00007fc15c002800 nid=0x229 runnable [0x00007fc161afa000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125)
at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:154)
at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:507)
at org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.lookupSeqNo(PerThreadIDVersionAndSeqNoLookup.java:167)
at org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.loadDocIdAndSeqNo(VersionsAndSeqNoResolver.java:164)
at org.elasticsearch.index.engine.InternalEngine.compareOpToLuceneDocBasedOnSeqNo(InternalEngine.java:728)
at org.elasticsearch.index.engine.InternalEngine.planIndexingAsNonPrimary(InternalEngine.java:1022)
at org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1041)
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:928)
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:826)
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:793)
at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1377)
at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1364)
at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$indexTranslogOperations$2(RecoveryTarget.java:363)
at org.elasticsearch.indices.recovery.RecoveryTarget$$Lambda$3301/731621683.get(Unknown Source)
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:197)
at org.elasticsearch.indices.recovery.RecoveryTarget.indexTranslogOperations(RecoveryTarget.java:338)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:512)
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:472)
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30)
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Thanks you anyone who can give me some advice.
PS. since the bottleneck of moving is translog_ops stage, I also wander if there is any way to minimize the translog file.