Wouldn't the value increase as you add more nodes?
Indeed, it most certainly will.
And with unicast discovery, each node will need to be told about the new
node. Which is the perfect time to tell it about the newly calculated
minimum number of masters.
Wouldn't the value increase as you add more nodes?
Indeed, it most certainly will.
And with unicast discovery, each node will need to be told about the new
node. Which is the perfect time to tell it about the newly calculated
minimum number of masters.
Wouldn't the value increase as you add more nodes?
Indeed, it most certainly will.
And with unicast discovery, each node will need to be told about the new
node. Which is the perfect time to tell it about the newly calculated
minimum number of masters.
"Wouldn't the value increase as you add more nodes?"
It will, which is precisely why the value is not computed automatically.
The value can decrease/increase over time, but the cluster does not know if
this is because it is on purpose or because of failures. The suggested
(n/2)+1 formula breaks down if n is constantly changing. With 10 nodes, it
is suggested to use the value of 6 (10/2+1) for the minimum number of
nodes. But if 5 nodes crash, then the suggested value is now 3 (5/2+1), but
clearly 3 is not the value we want, we want 6.
Wouldn't the value increase as you add more nodes?
Indeed, it most certainly will.
And with unicast discovery, each node will need to be told about the new
node. Which is the perfect time to tell it about the newly calculated
minimum number of masters.
But today elasticsearch doesn't do this automatically?
Short answer: No.
Long answer: Nooooooooooooooooooooooooooooooooooooooo.
For unicast discovery, you need to tell each node the full list of nodes in
the cluster. As far as I can tell, that list may include the local node so
it's easier to configure the same list everywhere. But it's not automatic.
For multicast discovery, it's automatic. But I've read enough about issues
with multicast to want to stay with unicast.
It's not a a matter of unicast vs. multicast or failure vs. non-failure.
Indeed, the node count and minimum master count are independent of the
discovery mechanism.
But the list of hostnames in the cluster is a matter of unicast; ES builds
the list dynamically during multicast.
It's only the cluster admin who knows what the maximum number of nodes
is.
Exactly. So what I was trying to say is, if the cluster admin has to count
all the nodes, the cluster admin might as well write down the hostnames
too. And then the cluster admin can use Chef or Puppet to send the list of
host names in a unicast configuration to all nodes.
And then my cool ES wrapper script picks up the list of hostnames, sets up
the unicast discovery options using those names, automatically calculates
the minimum number of nodes, and sets up that config option too. The ES
starts with the best chance to prevent a split-brain situation.
I hope that is clearer.
As a follow on, one of the uses of ES is for a QA verification tool to
emulate a production server but in a very lightweight (though fully
functional) way. To that end, during the startup there is a preload step to
ensure that a certain index is populated but to not reload if it already
exists. And it uses the unicast host name list to determine that if there
is only one host the preload step can be done automatically when the
cluster first starts. But if there are two or more nodes in the cluster,
the preload step is not done automatically.
Unicast is a nightmare for large ES deployments, with provisioning and
failures all the time. I'm used to DHCP/TFTP/PXE in my DC thanks to RedHat
so why should I waste time setting up hostnames or count hosts for ES?
IMHO I think ES can still be smart enough to calculate that formula
dynamically since it knows when servers are being added to the cluster,
correct? As for the node crash it's still a crash and if user wants to
decomission the node then the better way would be to explicitly run a
decomission command to indicate that nodes is not part of cluster anymore?
Is there any problem with this logic?
Unicast is a nightmare for large ES deployments, with provisioning and
failures all the time. I'm used to DHCP/TFTP/PXE in my DC thanks to RedHat
so why should I waste time setting up hostnames or count hosts for ES?
If the current node count is always assumed as maximum count, the
minimum_master_node condition N+1/2 would never be met
What you surely mean instead is that ES should remember cluster node count
over time, and keeps the max value as the value for the formula. But that
would mean you must not add nodes for spare which is also irritating, and
the formula of N+1/2 must be the one and only true formula. But N+1/2
could be tightened to stronger split-brain risk mitigation - there is no
reason why minimum_master_node should not be set to N so a cluster must
always find all nodes before master election.
I avoided multicast and preferred unicast based on many discussions in the
newsgroups and other sites. In particular, the Elasticsearch Preflight
Checklisthttp://asquera.de/opensource/2012/11/25/elasticsearch-pre-flight-checklist/.
Within this checklist, the sections entitled DISCOVERY and AVOIDING
SPLIT-BRAINS were the recommendationsI that I followed. This write-up
didn't say that mutlicast was bad, but neither have I read before that
unicast was a nightmare.
Whenever you get a chance, if you could describe the provisioning issues
and failures, that would help to shed some more light on this subject.
And it does seem that somebody needs to waste time counting hosts. You
stated (and I agree) that minimum_master_nodes must be set to N/2 + 1, and
N must be specified by the system admin... via counting? Or is there
something else I am not seeing?
Brian
On Wednesday, January 15, 2014 12:15:27 PM UTC-5, Jörg Prante wrote:
Unicast is a nightmare for large ES deployments, with provisioning and
failures all the time. I'm used to DHCP/TFTP/PXE in my DC thanks to RedHat
so why should I waste time setting up hostnames or count hosts for ES?
The preflight checklist is misleading in several statements:
you can not "accidentally join" clusters because of multicast. It happens
when using the default cluster name
"a lot of chatter with no use" is just pure ignorance of network
technology. Multicast was designed for zero config, and I love ES for zero
config. And there is no chatter if you set up a private network for ES
hosts. That is simple with a router/switch and DHCP. Whole corporation and
datacenter networks can not live without link-local config for example.
"you don't have to include the whole cluster" is misleading - each data
node must always be able to connect to one of the eligible master nodes, or
the cluster may not work correctly
the "avoiding split brain" section is a very bold and wrong headline. It
is promising that split-brains can be avoided by a N/2+1 setting.
Unfortunately that is nonsense, the truth is, it lowers the risk
significantly. But there can still be obscure network splits where two
halves of a split cluster may overlap, so the condition of N/2+1 can become
true for both halves.
Why do I not count hosts/nodes? There is a difference between the number of
data nodes and master eligible nodes in a cluster. I have three master
eligible nodes and set minimum_master_nodes to 3, and that's it. After that
I can start or stop additional data nodes as much as I like.
IHO, I think there is no perfect solution for any of the complex
network issues, however if we know of a config that reduces the risk
signficantly then I think we should adopt that as a default config.
The preflight checklist is misleading in several statements:
you can not "accidentally join" clusters because of multicast. It
happens when using the default cluster name
"a lot of chatter with no use" is just pure ignorance of network
technology. Multicast was designed for zero config, and I love ES for zero
config. And there is no chatter if you set up a private network for ES
hosts. That is simple with a router/switch and DHCP. Whole corporation and
datacenter networks can not live without link-local config for example.
"you don't have to include the whole cluster" is misleading - each data
node must always be able to connect to one of the eligible master nodes, or
the cluster may not work correctly
the "avoiding split brain" section is a very bold and wrong headline. It
is promising that split-brains can be avoided by a N/2+1 setting.
Unfortunately that is nonsense, the truth is, it lowers the risk
significantly. But there can still be obscure network splits where two
halves of a split cluster may overlap, so the condition of N/2+1 can become
true for both halves.
Why do I not count hosts/nodes? There is a difference between the number
of data nodes and master eligible nodes in a cluster. I have three master
eligible nodes and set minimum_master_nodes to 3, and that's it. After that
I can start or stop additional data nodes as much as I like.
We use mutlicast but our networks are very we're pretty used to multicast
for other stuff like varnish cache flushes. I figured unicast vs multicast
was a networking decision based on how your data center is rigged. No one
has complained about chatter and we've never seen a problem with accidental
joining. We keep the cluster names distinct even for networks that we
imagine aren't flat from a multicast perspective.
IHO, I think there is no perfect solution for any of the complex
network issues, however if we know of a config that reduces the risk
signficantly then I think we should adopt that as a default config.
The preflight checklist is misleading in several statements:
you can not "accidentally join" clusters because of multicast. It
happens when using the default cluster name
"a lot of chatter with no use" is just pure ignorance of network
technology. Multicast was designed for zero config, and I love ES for zero
config. And there is no chatter if you set up a private network for ES
hosts. That is simple with a router/switch and DHCP. Whole corporation and
datacenter networks can not live without link-local config for example.
"you don't have to include the whole cluster" is misleading - each data
node must always be able to connect to one of the eligible master nodes, or
the cluster may not work correctly
the "avoiding split brain" section is a very bold and wrong headline.
It is promising that split-brains can be avoided by a N/2+1 setting.
Unfortunately that is nonsense, the truth is, it lowers the risk
significantly. But there can still be obscure network splits where two
halves of a split cluster may overlap, so the condition of N/2+1 can become
true for both halves.
Why do I not count hosts/nodes? There is a difference between the number
of data nodes and master eligible nodes in a cluster. I have three master
eligible nodes and set minimum_master_nodes to 3, and that's it. After that
I can start or stop additional data nodes as much as I like.
Using quorum consensus (another name for the 'minimum_master_node'
approach) as default is not possible, since the quorum count is only known
by the admin.
There are perfect solutions for consensus but they are not easy to
implement, see Byzantine fault tolerance
Some points: just because you know how many ( master ) nodes you have
doesn't mean you know or should care about their hostnames ; ec2 . servers
are cattle not pets, etc.
One thing I am not sure about. Would it be possible ( ie , safe) to make
the quorum threshold a runtime configurable value, rather than having to
restart all the nodes for the change to take effect? We'd have to put some
code around this for safety of course ( what happens if you set a number >
N, for example...)
Also,can anyone comment on using zookeeper for master choosing ( and cfg
updates?) . I saw a plugin for zk but haven't had time to test.
Using quorum consensus (another name for the 'minimum_master_node'
approach) as default is not possible, since the quorum count is only known
by the admin.
There are perfect solutions for consensus but they are not easy to
implement, see Byzantine fault tolerance
How does minimum_master_nodes differ from action.write_consistency? Would
setting write_consistency to quorum help when minimum_master is not set
appropriately?
On Thu, Jan 16, 2014 at 6:08 AM, Norberto Meijome numard@gmail.com wrote:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.