How to maximize "stickiness" of nodes in cluster?


(Tony Su) #1

This is some of the test cluster's configuration(not necessarily relevant)
Physical configuration:
Nodes deployed virtually on a single machine
All nodes are Master and Data eligible, Master sometimes moves around and
data is distributed to all.
Node1 - Apps = 4GB RAM - 20GB storage -
Node2 - ES1 - 1GB RAM - 20GB storage
Node3 - ES2 - 1GB RAM - 20GB storage
Node4 - ES3 - 1GB RAM - 20GB storage
Node5 - ES4 - 1GB RAM - 20GB storage

Description - Nodes are dropped from a cluster too easily?

For most of the past 48hrs I've been studying something in my lab
cluster that seems to have popped up regularly and posted to this forum, a
number of threads unanswered.

On my test cluster, I found that once data size reached a particular
threshold, the ability of nodes to remain in the cluster became unstable.
After loading the data 3 times into the cluster, I've been able to
replicate the threshold consistently, so it's a real and replicable
phenomenon. The actual threshold I reached is likely specific to my setup
but I can see can be experienced regularly in other hardware setups as well.

And, pushed a bit further I experienced another often reported issue, a
shard became "orphaned," ie Both primary and replica wouldn't allocate
causing the entire cluster to remain in "red" status with no explanation
(Only log entries stated "'Failed to execute..."). In the end, this
specific issue was resolved only by deleting the entire index and
re-loading the data. I can see that if there isn't an archived source so
the data could be re-input into the cluster, the data might have been lost
altogether although the new Snapshot/Restore might also be a solution.

Attempted configs with no effect
1. network.node.http_keep-alive NOTE that no commented out setting exists
in a 1.0 elasticsearch.yml although is described in the 1.0 documentation,
so I created manually. If it really is supposed to work, in my case the
problem might not have been a networking issue.

2. Periodic ICMP Ping. I found this often seemed to help when joining a
node to the cluster, but it seems to have no effect on a node leaving a
cluster.

Theory:
I have already observed max disk utilization on the host, am speculating
that moments of max disk activity are causing Guests to sometimes become
unresponsive. There is probably a timeout for responses that is being
exceeded.
Looking for a Solution:
I believe that once a node has become part of a cluster, it should not be
so easy for the node to leave.
Individual actions might fail because a response has not been received, but
IMO the node should not be automatically excluded from the cluster so
easily or even automatically, so that intervention might be possible
without the cost of re-joining the cluster (re-building node participation
metadata, shard thrashing)

Believe preferable process
When an API call is made, it seems that there is a certain amount of time
that is expected before the call is considered failed. Then the same call
should be attempted against any replicas on record. It seems to me that not
only might a call be re-routed to a replica but the node that failed to
reply might also be dropped immediately. IMO the node should not be dropped
immediately bt marked for dropping and the actual drop should be subject to
specified configuration (or ultimately requiring manual action).

Reason why it should not be so easy to drop a node (and its data)
The node may have the only full copy of valid shard(s) data, so
dropping the node should not be so easy even if the node is completely
unresponsive. The node should remain a fully recognized member of the
cluster until a decision is made to definitely drop.

Note also that although not the fundamental cause, this "easy drop node" is
a major contributing cause of the theoretical scenario posed by Brad
Lhotsky in the thread "Data Loss"
https://groups.google.com/forum/#!topic/elasticsearch/kl60-C63cXY

Counter Opinion and Solutions?

IMO and Thx,
Tony

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bd557b6-86b3-476b-89a1-3f753d54d71e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Increase heap memory for the data nodes.

1GB for a data node is too small if you put heavy indexing on it over a
certain period of time. Your description matches the situation of getting
tight on GC.

You could also increase the ping timeout from 5s to something like 30s.

There are are many posting threads in this mailing list dealing with this
kind of issue.

If you set up VMs on a single host, you can not test everything, I
recommend to set up ES nodes on many hosts, for example to test larger
heaps, network communication with discovery, and network disconnects.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHnSRiE9aoHRXJRV_M1zHQvSB3z0BLmwDN79D5qkH-SZg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3