I have started getting some timeouts during replication and I am unsure of
how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.
There are several parameters relate to index recovery but I'm not sure how
they affect performance. Has anyone any suggestions?
Is it possible to post the full timeout exception?
Do you run JDBC river with replica level 0 and add replica later after
river completion?
I saw this in the past and I'm not sure if this is related to tight
resources.
In the next JDBC river version there will be more convenient control of
bulk index settings (automatic replica level 0, refresh disabling,
re-enabling of refresh & replica afterwards).
I have started getting some timeouts during replication and I am unsure of
how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.
There are several parameters relate to index recovery but I'm not sure how
they affect performance. Has anyone any suggestions?
[2014-04-28 13:40:15,039][WARN ][cluster.action.shard] [eis05]
[ds_clearcase-vob-heat-analyzer][2] sending failed shard for
[ds_clearcase-vob-heat-analyzer][2], node[QyeTlW2YQbG27zrsdjBBGA], [R],
s[INITIALIZING], indexUUID [ms7jQeuMQduNIHCmjxsKjQ], reason [Failed to
start shard, message
[RecoveryFailedException[[ds_clearcase-vob-heat-analyzer][2]: Recovery
failed from
[eis09][p8-_fzHeTR22pSlsBsYm8A][eis09.rnditlab.ericsson.se][inet[/137.58.184.239:9300]]{datacenter=PoCC}
into
[eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}];
nested:
RemoteTransportException[[eis09][inet[/137.58.184.239:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[ds_clearcase-vob-heat-analyzer][2]
Phase[2] Execution failed]; nested:
ReceiveTimeoutTransportException[[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog]
request_id [6809886] timed out after [900000ms]]; ]]
[2014-04-28 14:00:11,614][WARN ][indices.cluster] [eis05]
[ds_clearcase-vob-heat-analyzer][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[ds_clearcase-vob-heat-analyzer][0]: Recovery failed from
[eis07][Q8ZWgDIXRGiUej1oMoH8Jg][eis07.rnditlab.ericsson.se][inet[/137.58.184.237:9300]]{datacenter=PoCC}
into
[eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:307)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65)
at
org.elasticsearch.indices.recovery.RecoveryTarget$3.run(RecoveryTarget.java:184)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[eis07][inet[/137.58.184.237:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[ds_clearcase-vob-heat-analyzer][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1098)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:627)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:117)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:61)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:323)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException:
[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog]
request_id [154592652] timed out after [900000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
... 3 more
The river has been running for some time copying new documents from the db
into es. The problem as I see it is that it is too big to copy in 15
minutes.
/Michael
On Monday, 28 April 2014 12:48:22 UTC+2, Jörg Prante wrote:
Is it possible to post the full timeout exception?
Do you run JDBC river with replica level 0 and add replica later after
river completion?
I saw this in the past and I'm not sure if this is related to tight
resources.
In the next JDBC river version there will be more convenient control of
bulk index settings (automatic replica level 0, refresh disabling,
re-enabling of refresh & replica afterwards).
Jörg
On Mon, Apr 28, 2014 at 12:22 PM, Michael Salmon <michael...@inovia.nu<javascript:>
wrote:
I have started getting some timeouts during replication and I am unsure
of how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.
There are several parameters relate to index recovery but I'm not sure
how they affect performance. Has anyone any suggestions?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.