Cannot relocate shard bcause of unstable network

Rzulf · October 31, 2013, 1:13pm

I have a ES cluster in data center in France, I want to attach node to the
cluster from data center in Poland. Network statistics between two servers
are the following: ping ~35ms, transfer ~80Mbit/s

The problem is that while relocating shard (120GB) at random point I get
closed channel exceptions and time-outs even through the connection between
hosts seems to be stable (ping works, other apps as well). Suddenly
connections with node from which I copy shard is lost, and connection with
cluster clients is lost. What is strange that other nodes with which I lost
connection are not aware of that and continue working like nothing
happened. I have seen that in issue
https://github.com/elasticsearch/elasticsearch/issues/2733 this problem was
fixed, but would it help in my situation? If network error occur would
relocating shard be resumed or restarted from scratch? Relocating shard
takes about 3 hours, so if my network connection cannot sustain such long
transfer is there any sense in attaching node to cluster?

I user ES version 0.20.4

Regards
Michał

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rzulf · November 4, 2013, 6:16pm

Anyone? What happens when connection is temporarily lost during relocation?

Michał

W dniu czwartek, 31 października 2013 14:13:35 UTC+1 użytkownik Michał
napisał:

I have a ES cluster in data center in France, I want to attach node to the
cluster from data center in Poland. Network statistics between two servers
are the following: ping ~35ms, transfer ~80Mbit/s

The problem is that while relocating shard (120GB) at random point I get
closed channel exceptions and time-outs even through the connection between
hosts seems to be stable (ping works, other apps as well). Suddenly
connections with node from which I copy shard is lost, and connection with
cluster clients is lost. What is strange that other nodes with which I lost
connection are not aware of that and continue working like nothing
happened. I have seen that in issue
Network: A closed channel might not always fire up a close event · Issue #2733 · elastic/elasticsearch · GitHub this problem
was fixed, but would it help in my situation? If network error occur would
relocating shard be resumed or restarted from scratch? Relocating shard
takes about 3 hours, so if my network connection cannot sustain such long
transfer is there any sense in attaching node to cluster?

I user ES version 0.20.4

Regards
Michał

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · November 14, 2013, 9:01am

Hey,

it is not recommended to run a cluster in a cross data centre environment
at the moment. Also, the elasticsearch version you are using is really old,
you should upgrade. There are a couple of workarounds to the cross data
center problem to prevent restarting something like a relocation over and
over again because of an unstable permanent network connection:

Application level replication. You simply index your data into both
clusters (paris and poland). Maybe using something like an MQ mechanism
makes sense in order to prevent waits to index in the long distanced data
center.
If there is no need to be realtime, the new Snapshot/Restore API, which
will come with elasticsearch 1.0 might be good to keep in mind.
Snapshot/Restore API - Phase I · Issue #3826 · elastic/elasticsearch · GitHub

--Alex

On Mon, Nov 4, 2013 at 7:16 PM, Michał mbrzezicki@gmail.com wrote:

Anyone? What happens when connection is temporarily lost during relocation?

Michał

W dniu czwartek, 31 października 2013 14:13:35 UTC+1 użytkownik Michał
napisał:

I have a ES cluster in data center in France, I want to attach node to
the cluster from data center in Poland. Network statistics between two
servers are the following: ping ~35ms, transfer ~80Mbit/s

The problem is that while relocating shard (120GB) at random point I get
closed channel exceptions and time-outs even through the connection between
hosts seems to be stable (ping works, other apps as well). Suddenly
connections with node from which I copy shard is lost, and connection with
cluster clients is lost. What is strange that other nodes with which I lost
connection are not aware of that and continue working like nothing
happened. I have seen that in issue https://github.com/
elasticsearch/elasticsearch/issues/2733 this problem was fixed, but
would it help in my situation? If network error occur would relocating
shard be resumed or restarted from scratch? Relocating shard takes about 3
hours, so if my network connection cannot sustain such long transfer is
there any sense in attaching node to cluster?

I user ES version 0.20.4

Regards
Michał

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rzulf · November 14, 2013, 10:35am

Thanks for reply. I managed to set up cluster by setting
network.tcp.keep_alive to true and changing system keepalive from 2 hours
to 30 minutes (/proc/sys/net/ipv4/tcp_keepalive_time). The cluster now
works fine for few days, and there is no noticable loss in prerformance.

Michał

2013/11/14 Alexander Reelsen alr@spinscale.de

Hey,

it is not recommended to run a cluster in a cross data centre environment
at the moment. Also, the elasticsearch version you are using is really old,
you should upgrade. There are a couple of workarounds to the cross data
center problem to prevent restarting something like a relocation over and
over again because of an unstable permanent network connection:

Application level replication. You simply index your data into both
clusters (paris and poland). Maybe using something like an MQ mechanism
makes sense in order to prevent waits to index in the long distanced data
center.

If there is no need to be realtime, the new Snapshot/Restore API, which
will come with elasticsearch 1.0 might be good to keep in mind.
Snapshot/Restore API - Phase I · Issue #3826 · elastic/elasticsearch · GitHub

--Alex

On Mon, Nov 4, 2013 at 7:16 PM, Michał mbrzezicki@gmail.com wrote:

Anyone? What happens when connection is temporarily lost during
relocation?

Michał

W dniu czwartek, 31 października 2013 14:13:35 UTC+1 użytkownik Michał
napisał:

I have a ES cluster in data center in France, I want to attach node to
the cluster from data center in Poland. Network statistics between two
servers are the following: ping ~35ms, transfer ~80Mbit/s

The problem is that while relocating shard (120GB) at random point I get
closed channel exceptions and time-outs even through the connection between
hosts seems to be stable (ping works, other apps as well). Suddenly
connections with node from which I copy shard is lost, and connection with
cluster clients is lost. What is strange that other nodes with which I lost
connection are not aware of that and continue working like nothing
happened. I have seen that in issue https://github.com/
elasticsearch/elasticsearch/issues/2733 this problem was fixed, but
would it help in my situation? If network error occur would relocating
shard be resumed or restarted from scratch? Relocating shard takes about 3
hours, so if my network connection cannot sustain such long transfer is
there any sense in attaching node to cluster?

I user ES version 0.20.4

Regards
Michał

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/a-neLZ8mXl8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Error relocate shard when node restart Elasticsearch	5	2651	July 5, 2017
After adding a third node to the cluster, some shards won't relocate Elasticsearch	4	526	July 6, 2017
Shards refuse to relocate to different nodes using cluster.routing.allocation.exclude Elasticsearch	3	2260	July 13, 2019
Shards stuck in relocating Elasticsearch	3	2767	July 5, 2017
Could relocate speed faster than 96MB/s? Elasticsearch	11	2046	May 9, 2019

Cannot relocate shard bcause of unstable network

Related topics