Elastic cluster issue

vinod_rao · April 11, 2016, 8:47pm

Hi Team,

I have 3 node cluster and i am seeing 1 node is frequently coming out off cluster and rejoining. Attaching the log for the information. Can anyone let me know what could be the possible issue.below is the log screen shot.

warkolm · April 11, 2016, 8:55pm

Please don't post pictures of text, they are difficult to read and some people may not be even able to see them.

What version are you one?

vinod_rao · April 11, 2016, 8:59pm

Sure i wont do it again.

I am using 1.7.1

warkolm · April 11, 2016, 9:02pm

Can you post the error as well?

vinod_rao · April 11, 2016, 9:03pm

[ WARN] 2016-04-11 08:18:54,014 org.elasticsearch.indices.cluster - [node3] [[end_device_events_20160319][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [end_device_events_20160319][1]: Recovery failed from [node2][p0b_1ge8Tmu_3l78lRaA-w][node2][inet[/x.x.x.x:9
301]]{local=false, master=true} into [node3][W9XjmFo6REWuJLlfg0W-EA][node3][inet[/x.x.x.x:9301]]{local=false, master=true}
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [node2][inet[/x.x.x.x:9301]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [end_device_events_20160319][1] Phase[1] Execution failed
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:883)
at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:780)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [end_device_events_20160319][1] Failed to transfer [16] files with total size of [1.1gb]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:430)
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:878)
... 10 more
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [node3][inet[/x.x.x.x:9301]][internal:index/shard/recovery/filesInfo] request_id [31249632] tim
ed out after [900016ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
... 3 more

warkolm · April 11, 2016, 9:05pm

Looks like network timeouts.

Are all your nodes in the same datacenter?

vinod_rao · April 11, 2016, 9:07pm

yes they are all in one dataCenter only.

vinod_rao · April 11, 2016, 9:09pm

I did continuous ping between all nodes but i didn't see any packet drops

Topic		Replies	Views
About org.elasticsearch.indices.recovery.RecoveryFailedException error Elasticsearch	0	1602	April 21, 2015
Network interruption, some nodes not recovering Elasticsearch	0	387	February 8, 2012
RecoveryFailedException after a node restart Elasticsearch	1	1909	July 6, 2015
A node "Received response for a request that has timed out, sent [24704ms] ago, timed out [9704ms] ago, action [cluster: monitor / nodes / stats [n]]," stuck entire cluster Elasticsearch	2	2768	April 19, 2017
Cluster reovery failed and data node is not reachability Elasticsearch	0	222	May 13, 2022

Elastic cluster issue

Related topics