Elastic Search fails to restart because of high CPU usage and error failed to recover from translog

adiwakar · July 2, 2020, 6:23pm

Hi All,
I shall be grateful if you could help me.

        I am using ES version 6.4 and have 2 node cluster setup (I know its bad, we have plans to change it). One node is in Washington and another Virginia. Everything was working fine until one day after rebooting ES host in Washington(We have dedicated link). After reboot ES fails to start and I see continuous failure logs(please see below)

I see a similar discussion here

But, how do I verify the fact which DavidTurner has given? How can I come out of this situation.

ES is in a docker container and deployed in a ESXi server.

32 GB Ram 8 Virtual Core, 50GB HDD. Free space on HDD is 12 GB. 100%CPU utilization

2020-05-31T06:24:46,134 [WARN][org.elasticsearch.index.engine.Engine.failEngine(Engine.java:1003)] [elasticsearch[RdCXtk2][generic][T#2]est_thread_info][RdCXtk2] [dfc-traffic_2020_05][3]  failed engine [failed to recover from translog]
org.elasticsearch.index.engine.EngineException: failed to recover from translog
	at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:409) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:380) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:101) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1333) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:421) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1603) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$4(IndexShard.java:2055) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.3.jar:6.4.3]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
Caused by: org.elasticsearch.index.shard.IllegalIndexShardStateException: CurrentState[CLOSED] operation only allowed when recovering, origin [LOCAL_TRANSLOG_RECOVERY]
	at org.elasticsearch.index.shard.IndexShard.ensureWriteAllowed(IndexShard.java:1480) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:699) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1272) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1301) ~[elasticsearch-6.4.3.jar:6.4.3]
	at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:407) ~[elasticsearch-6.4.3.jar:6.4.3]
	... 13 more
2020-05-31T06:24:46,138 [INFO][org.elasticsearch.node.Node.stop(Node.java:785)] [Thread-2est_thread_info][RdCXtk2]  stopped
2020-05-31T06:24:46,139 [INFO][org.elasticsearch.node.Node.close(Node.java:803)] [Thread-2est_thread_info][RdCXtk2]  closing ...
2020-05-31T06:24:46,136

system · July 30, 2020, 6:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failed to recover from translog CurrentState[CLOSED] Elasticsearch	3	2655	September 4, 2019
Elastic Translog corrupted error (Unassigned shards) Elasticsearch	2	169	November 15, 2023
Failed to retieve translog exception Elasticsearch	14	745	July 6, 2017
ES failed to recover from translog corruption Elasticsearch	6	7737	December 17, 2018
ES Cluster Recovery and Restart Elasticsearch	3	586	July 6, 2017

Elastic Search fails to restart because of high CPU usage and error failed to recover from translog

Related topics