Replication timeouts


(Michael Salmon) #1

I have started getting some timeouts during replication and I am unsure of
how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.

There are several parameters relate to index recovery but I'm not sure how
they affect performance. Has anyone any suggestions?

/Michael

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

Is it possible to post the full timeout exception?

Do you run JDBC river with replica level 0 and add replica later after
river completion?

I saw this in the past and I'm not sure if this is related to tight
resources.

In the next JDBC river version there will be more convenient control of
bulk index settings (automatic replica level 0, refresh disabling,
re-enabling of refresh & replica afterwards).

Jörg

On Mon, Apr 28, 2014 at 12:22 PM, Michael Salmon
michael.salmon@inovia.nuwrote:

I have started getting some timeouts during replication and I am unsure of
how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.

There are several parameters relate to index recovery but I'm not sure how
they affect performance. Has anyone any suggestions?

/Michael

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHSY09g3Dq2tgs18KPiOy82BVtNhoyWVKw-f4OmRjjn-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Michael Salmon) #3

[2014-04-28 13:40:15,039][WARN ][cluster.action.shard] [eis05]
[ds_clearcase-vob-heat-analyzer][2] sending failed shard for
[ds_clearcase-vob-heat-analyzer][2], node[QyeTlW2YQbG27zrsdjBBGA], [R],
s[INITIALIZING], indexUUID [ms7jQeuMQduNIHCmjxsKjQ], reason [Failed to
start shard, message
[RecoveryFailedException[[ds_clearcase-vob-heat-analyzer][2]: Recovery
failed from
[eis09][p8-_fzHeTR22pSlsBsYm8A][eis09.rnditlab.ericsson.se][inet[/137.58.184.239:9300]]{datacenter=PoCC}
into
[eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}];
nested:
RemoteTransportException[[eis09][inet[/137.58.184.239:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[ds_clearcase-vob-heat-analyzer][2]
Phase[2] Execution failed]; nested:
ReceiveTimeoutTransportException[[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog]
request_id [6809886] timed out after [900000ms]]; ]]
[2014-04-28 14:00:11,614][WARN ][indices.cluster] [eis05]
[ds_clearcase-vob-heat-analyzer][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[ds_clearcase-vob-heat-analyzer][0]: Recovery failed from
[eis07][Q8ZWgDIXRGiUej1oMoH8Jg][eis07.rnditlab.ericsson.se][inet[/137.58.184.237:9300]]{datacenter=PoCC}
into
[eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:307)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65)
at
org.elasticsearch.indices.recovery.RecoveryTarget$3.run(RecoveryTarget.java:184)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[eis07][inet[/137.58.184.237:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[ds_clearcase-vob-heat-analyzer][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1098)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:627)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:117)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:61)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:323)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException:
[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog]
request_id [154592652] timed out after [900000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
... 3 more

The river has been running for some time copying new documents from the db
into es. The problem as I see it is that it is too big to copy in 15
minutes.

/Michael

On Monday, 28 April 2014 12:48:22 UTC+2, Jörg Prante wrote:

Is it possible to post the full timeout exception?

Do you run JDBC river with replica level 0 and add replica later after
river completion?

I saw this in the past and I'm not sure if this is related to tight
resources.

In the next JDBC river version there will be more convenient control of
bulk index settings (automatic replica level 0, refresh disabling,
re-enabling of refresh & replica afterwards).

Jörg

On Mon, Apr 28, 2014 at 12:22 PM, Michael Salmon <michael...@inovia.nu<javascript:>

wrote:

I have started getting some timeouts during replication and I am unsure
of how to proceed. The index is about 500 million documents or 45GB spread
over 8 shards and created by a jdbc river. The timeout is occurring
during index/shard/recovery/prepareTranslog. It seems that the limit of 15
minutes is hard coded or I would have tried changing that.

There are several parameters relate to index recovery but I'm not sure
how they affect performance. Has anyone any suggestions?

/Michael

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c41772d0-7251-4f74-81b1-7f1058ed24f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4