I have a ES cluster in data center in France, I want to attach node to the
cluster from data center in Poland. Network statistics between two servers
are the following: ping ~35ms, transfer ~80Mbit/s
The problem is that while relocating shard (120GB) at random point I get
closed channel exceptions and time-outs even through the connection between
hosts seems to be stable (ping works, other apps as well). Suddenly
connections with node from which I copy shard is lost, and connection with
cluster clients is lost. What is strange that other nodes with which I lost
connection are not aware of that and continue working like nothing
happened. I have seen that in issue https://github.com/elasticsearch/elasticsearch/issues/2733 this problem was
fixed, but would it help in my situation? If network error occur would
relocating shard be resumed or restarted from scratch? Relocating shard
takes about 3 hours, so if my network connection cannot sustain such long
transfer is there any sense in attaching node to cluster?
Anyone? What happens when connection is temporarily lost during relocation?
Michał
W dniu czwartek, 31 października 2013 14:13:35 UTC+1 użytkownik Michał
napisał:
I have a ES cluster in data center in France, I want to attach node to the
cluster from data center in Poland. Network statistics between two servers
are the following: ping ~35ms, transfer ~80Mbit/s
The problem is that while relocating shard (120GB) at random point I get
closed channel exceptions and time-outs even through the connection between
hosts seems to be stable (ping works, other apps as well). Suddenly
connections with node from which I copy shard is lost, and connection with
cluster clients is lost. What is strange that other nodes with which I lost
connection are not aware of that and continue working like nothing
happened. I have seen that in issue Network: A closed channel might not always fire up a close event · Issue #2733 · elastic/elasticsearch · GitHub this problem
was fixed, but would it help in my situation? If network error occur would
relocating shard be resumed or restarted from scratch? Relocating shard
takes about 3 hours, so if my network connection cannot sustain such long
transfer is there any sense in attaching node to cluster?
it is not recommended to run a cluster in a cross data centre environment
at the moment. Also, the elasticsearch version you are using is really old,
you should upgrade. There are a couple of workarounds to the cross data
center problem to prevent restarting something like a relocation over and
over again because of an unstable permanent network connection:
Application level replication. You simply index your data into both
clusters (paris and poland). Maybe using something like an MQ mechanism
makes sense in order to prevent waits to index in the long distanced data
center.
Anyone? What happens when connection is temporarily lost during relocation?
Michał
W dniu czwartek, 31 października 2013 14:13:35 UTC+1 użytkownik Michał
napisał:
I have a ES cluster in data center in France, I want to attach node to
the cluster from data center in Poland. Network statistics between two
servers are the following: ping ~35ms, transfer ~80Mbit/s
The problem is that while relocating shard (120GB) at random point I get
closed channel exceptions and time-outs even through the connection between
hosts seems to be stable (ping works, other apps as well). Suddenly
connections with node from which I copy shard is lost, and connection with
cluster clients is lost. What is strange that other nodes with which I lost
connection are not aware of that and continue working like nothing
happened. I have seen that in issue https://github.com/
elasticsearch/elasticsearch/issues/2733 this problem was fixed, but
would it help in my situation? If network error occur would relocating
shard be resumed or restarted from scratch? Relocating shard takes about 3
hours, so if my network connection cannot sustain such long transfer is
there any sense in attaching node to cluster?
Thanks for reply. I managed to set up cluster by setting
network.tcp.keep_alive to true and changing system keepalive from 2 hours
to 30 minutes (/proc/sys/net/ipv4/tcp_keepalive_time). The cluster now
works fine for few days, and there is no noticable loss in prerformance.
it is not recommended to run a cluster in a cross data centre environment
at the moment. Also, the elasticsearch version you are using is really old,
you should upgrade. There are a couple of workarounds to the cross data
center problem to prevent restarting something like a relocation over and
over again because of an unstable permanent network connection:
Application level replication. You simply index your data into both
clusters (paris and poland). Maybe using something like an MQ mechanism
makes sense in order to prevent waits to index in the long distanced data
center.
Anyone? What happens when connection is temporarily lost during
relocation?
Michał
W dniu czwartek, 31 października 2013 14:13:35 UTC+1 użytkownik Michał
napisał:
I have a ES cluster in data center in France, I want to attach node to
the cluster from data center in Poland. Network statistics between two
servers are the following: ping ~35ms, transfer ~80Mbit/s
The problem is that while relocating shard (120GB) at random point I get
closed channel exceptions and time-outs even through the connection between
hosts seems to be stable (ping works, other apps as well). Suddenly
connections with node from which I copy shard is lost, and connection with
cluster clients is lost. What is strange that other nodes with which I lost
connection are not aware of that and continue working like nothing
happened. I have seen that in issue https://github.com/
elasticsearch/elasticsearch/issues/2733 this problem was fixed, but
would it help in my situation? If network error occur would relocating
shard be resumed or restarted from scratch? Relocating shard takes about 3
hours, so if my network connection cannot sustain such long transfer is
there any sense in attaching node to cluster?
I user ES version 0.20.4
Regards
Michał
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.