Hi Team,
I have 3 node cluster and i am seeing 1 node is frequently coming out off cluster and rejoining. Attaching the log for the information. Can anyone let me know what could be the possible issue.below is the log screen shot.
Hi Team,
I have 3 node cluster and i am seeing 1 node is frequently coming out off cluster and rejoining. Attaching the log for the information. Can anyone let me know what could be the possible issue.below is the log screen shot.
Please don't post pictures of text, they are difficult to read and some people may not be even able to see them.
What version are you one?
Sure i wont do it again.
I am using 1.7.1
Can you post the error as well?
[ WARN] 2016-04-11 08:18:54,014 org.elasticsearch.indices.cluster - [node3] [[end_device_events_20160319][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [end_device_events_20160319][1]: Recovery failed from [node2][p0b_1ge8Tmu_3l78lRaA-w][node2][inet[/x.x.x.x:9
301]]{local=false, master=true} into [node3][W9XjmFo6REWuJLlfg0W-EA][node3][inet[/x.x.x.x:9301]]{local=false, master=true}
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [node2][inet[/x.x.x.x:9301]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [end_device_events_20160319][1] Phase[1] Execution failed
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:883)
at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:780)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [end_device_events_20160319][1] Failed to transfer [16] files with total size of [1.1gb]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:430)
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:878)
... 10 more
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [node3][inet[/x.x.x.x:9301]][internal:index/shard/recovery/filesInfo] request_id [31249632] tim
ed out after [900016ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
... 3 more
Looks like network timeouts.
Are all your nodes in the same datacenter?
yes they are all in one dataCenter only.
I did continuous ping between all nodes but i didn't see any packet drops
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.