Cluster nodes communication

ray_qi · May 28, 2010, 6:40am

Hi

There is only one single master in the cluster (automatically elected), since the master node maintains some critical information, I doubt if it will be the bottleneck when we have thousands of nodes. For master-node communication, is Zookeeper a better choice?

Thanks

Berkay_Mollamustafao · May 28, 2010, 8:41pm

Hi,

Why do you say there is a master? AFAIK, all nodes are equal.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, May 28, 2010 at 2:40 AM, ray.qi ray7628@gmail.com wrote:

Hi

There is only one single master in the cluster (automatically elected),
since the master node maintains some critical information, I doubt if it
will be the bottleneck when we have thousands of nodes. For master-node
communication, is Zookeeper a better choice?

Thanks

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/cluster-nodes-communication-tp850919p850919.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Lukas_Vlcek1 · May 28, 2010, 9:06pm

Hi,

actually there is a master node. You can read what Shay explained about
master node at sematext blog interview here (check comments section too):

It reads:
A master in elasticsearch is responsible for handling nodes coming and going
and allocation of shards. Note, the master is not a single point of failure,
if it fails, then another node will be elected as master. Also note, that
nodes do not need to communicate with the master on each request, so its not
a single point of bottleneck.
EOD

As for the massive deployment (thousands of nodes) I do not have experience
but I think there are other factors that can slow down performance
significantly depending on cluster setup as sharding and replication needs
to take place (though they can run on background the communication inside
the cluster needs to happen between nodes and I think it would be much more
data intensive communication compared to node-master communication). As for
the cluster setup and other critical info AFAIK it should be persisted via
gateway so if anything goes wrong and the cluster crashes there should be
way how to recover.

Regards,
Lukas

On Fri, May 28, 2010 at 10:41 PM, Berkay Mollamustafaoglu <mberkay@gmail.com

wrote:

Hi,

Why do you say there is a master? AFAIK, all nodes are equal.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, May 28, 2010 at 2:40 AM, ray.qi ray7628@gmail.com wrote:

Hi

There is only one single master in the cluster (automatically elected),
since the master node maintains some critical information, I doubt if it
will be the bottleneck when we have thousands of nodes. For master-node
communication, is Zookeeper a better choice?

Thanks

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/cluster-nodes-communication-tp850919p850919.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

kimchy · May 28, 2010, 10:34pm

Just another point regarding zookeeper, elasticsearch architecture is
similar in its idea to a zookeeper based cluster management, with several
exceptions:

The master is dynamically allocated, so no preferred number of master
nodes.
Setup is sooo much simpler, and did someone said cloud :). Zookeeper is a
pain to setup in cloud envs, elastic ips, special nodes, all are relics of
"non cloudy" discovery models, which, by the way, plays much nicer also when
not in the cloud.
Long term persistency (i.e. full cluster failure) of the cluster state is
maintained through the gateway.
I would venture a guess (and its just a guess) that elasticsearch is more
scalable than zookeeper (when it comes to cluster state management).

Note, I am evaluating zookeeper here only for the subset of features
elasticsearch would have needed from it. Zookeeper provides much more
features that are irrelevant to elasticsearch which makes it cool to use in
other scenarios.

cheers,
shay.banon

On Sat, May 29, 2010 at 12:06 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

actually there is a master node. You can read what Shay explained about
master node at sematext blog interview here (check comments section too):
Solr vs Elasticsearch: Performance Differences & More - Sematext

It reads:
A master in elasticsearch is responsible for handling nodes coming and
going and allocation of shards. Note, the master is not a single point of
failure, if it fails, then another node will be elected as master. Also
note, that nodes do not need to communicate with the master on each request,
so its not a single point of bottleneck.
EOD

As for the massive deployment (thousands of nodes) I do not have experience
but I think there are other factors that can slow down performance
significantly depending on cluster setup as sharding and replication needs
to take place (though they can run on background the communication inside
the cluster needs to happen between nodes and I think it would be much more
data intensive communication compared to node-master communication). As for
the cluster setup and other critical info AFAIK it should be persisted via
gateway so if anything goes wrong and the cluster crashes there should be
way how to recover.

Regards,
Lukas

On Fri, May 28, 2010 at 10:41 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

Hi,

Why do you say there is a master? AFAIK, all nodes are equal.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, May 28, 2010 at 2:40 AM, ray.qi ray7628@gmail.com wrote:

Hi

There is only one single master in the cluster (automatically elected),
since the master node maintains some critical information, I doubt if it
will be the bottleneck when we have thousands of nodes. For master-node
communication, is Zookeeper a better choice?

Thanks

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/cluster-nodes-communication-tp850919p850919.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

ray_qi · May 29, 2010, 1:43am

Thanks for prompt replying.

AFAIK, The design of ElasticSearch is perfect after I compared serval solutions from code level. As for the massive deployment (thousands of nodes), I still have some concerns about performance regarding the communication of node-master, nodes-nodes and nodes-gateway. That's why I seek other potential options or make some changes to make it better. Probably you genius guys can give some predictions of performance bottleneck in massive deployment situation.

Regards
-Ray

kimchy · June 1, 2010, 8:21pm

Well, I have not tested yet thousands of nodes, but master node
communication seems to work for hadoop ...

On Sat, May 29, 2010 at 4:43 AM, ray.qi ray7628@gmail.com wrote:

Thanks for prompt replying.

AFAIK, The design of Elasticsearch is perfect after I compared serval
solutions from code level. As for the massive deployment (thousands of
nodes), I still have some concerns about performance regarding the
communication of node-master, nodes-nodes and nodes-gateway. That's why I
seek other potential options or make some changes to make it better.
Probably you genius guys can give some predictions of performance
bottleneck
in massive deployment situation.

Regards
-Ray

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/cluster-nodes-communication-tp850919p853589.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.