Replication timeouts

Michael_Salmon · April 28, 2014, 10:22am

I have started getting some timeouts during replication and I am unsure of
how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.

There are several parameters relate to index recovery but I'm not sure how
they affect performance. Has anyone any suggestions?

/Michael

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · April 28, 2014, 10:48am

Is it possible to post the full timeout exception?

Do you run JDBC river with replica level 0 and add replica later after
river completion?

I saw this in the past and I'm not sure if this is related to tight
resources.

In the next JDBC river version there will be more convenient control of
bulk index settings (automatic replica level 0, refresh disabling,
re-enabling of refresh & replica afterwards).

Jörg

On Mon, Apr 28, 2014 at 12:22 PM, Michael Salmon
michael.salmon@inovia.nuwrote:

I have started getting some timeouts during replication and I am unsure of
how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.

There are several parameters relate to index recovery but I'm not sure how
they affect performance. Has anyone any suggestions?

/Michael

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHSY09g3Dq2tgs18KPiOy82BVtNhoyWVKw-f4OmRjjn-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Michael_Salmon · April 28, 2014, 12:14pm

[2014-04-28 13:40:15,039][WARN ][cluster.action.shard] [eis05]
[ds_clearcase-vob-heat-analyzer][2] sending failed shard for
[ds_clearcase-vob-heat-analyzer][2], node[QyeTlW2YQbG27zrsdjBBGA], [R],
s[INITIALIZING], indexUUID [ms7jQeuMQduNIHCmjxsKjQ], reason [Failed to
start shard, message
[RecoveryFailedException[[ds_clearcase-vob-heat-analyzer][2]: Recovery
failed from
[eis09][p8-_fzHeTR22pSlsBsYm8A][eis09.rnditlab.ericsson.se][inet[/137.58.184.239:9300]]{datacenter=PoCC}
into
[eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}];
nested:
RemoteTransportException[[eis09][inet[/137.58.184.239:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[ds_clearcase-vob-heat-analyzer][2]
Phase[2] Execution failed]; nested:
ReceiveTimeoutTransportException[[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog]
request_id [6809886] timed out after [900000ms]]; ]]
[2014-04-28 14:00:11,614][WARN ][indices.cluster] [eis05]
[ds_clearcase-vob-heat-analyzer][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[ds_clearcase-vob-heat-analyzer][0]: Recovery failed from
[eis07][Q8ZWgDIXRGiUej1oMoH8Jg][eis07.rnditlab.ericsson.se][inet[/137.58.184.237:9300]]{datacenter=PoCC}
into
[eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:307)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65)
at
org.elasticsearch.indices.recovery.RecoveryTarget$3.run(RecoveryTarget.java:184)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[eis07][inet[/137.58.184.237:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[ds_clearcase-vob-heat-analyzer][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1098)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:627)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:117)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:61)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:323)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException:
[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog]
request_id [154592652] timed out after [900000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
... 3 more

The river has been running for some time copying new documents from the db
into es. The problem as I see it is that it is too big to copy in 15
minutes.

/Michael

On Monday, 28 April 2014 12:48:22 UTC+2, Jörg Prante wrote:

Is it possible to post the full timeout exception?

Do you run JDBC river with replica level 0 and add replica later after
river completion?

I saw this in the past and I'm not sure if this is related to tight
resources.

In the next JDBC river version there will be more convenient control of
bulk index settings (automatic replica level 0, refresh disabling,
re-enabling of refresh & replica afterwards).

Jörg

On Mon, Apr 28, 2014 at 12:22 PM, Michael Salmon <michael...@inovia.nu<javascript:>

wrote:

I have started getting some timeouts during replication and I am unsure
of how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.

There are several parameters relate to index recovery but I'm not sure
how they affect performance. Has anyone any suggestions?

/Michael

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c41772d0-7251-4f74-81b1-7f1058ed24f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Bulk update times out Elasticsearch	3	321	July 6, 2017
Elasticsearch timeout exceptions Elasticsearch	1	324	July 6, 2017
Shard copying performance Elasticsearch	4	375	July 6, 2017
Timeout Elasticsearch	4	900	July 6, 2017
Es readtimeout occurs Elasticsearch	13	485	April 7, 2022

Replication timeouts

Related topics