Requirements per node role

plaflamme · January 6, 2012, 2:49am

Hi,

A node in a cluster can be configured to serve exclusively as: a data,
master or "client" node. By deciding a node's role a-priori, I suspect one
should tweak its hardware (more RAM, less CPU, etc.) to fit. What would be
the (relative) recommendations per role?

data nodes (not master eligible, not serving requests):
- lots of RAM for faceting and sorting
- lots of CPU for indexing, analyzing and querying
master nodes (no data, not serving requests):
seems it's not doing much... lazy master nodes!
client nodes (not master eligible, no data):
- lots of CPU for aggregating and serving lots of parallel requests
- is RAM still important here (is it affected by faceting and sorting?)

If there are several master-only nodes, are they all idle except for one at
any given time?

Anyone have experience in deploying a cluster with "load-balancing" clients
for serving requests?

Thanks,
Philippe

Karussell1 · January 6, 2012, 12:19pm

ES does not have the concept master vs. slave.

Have a look:

Peter.

On 6 Jan., 03:49, Philippe Laflamme philippe.lafla...@obiba.org
wrote:

Hi,

A node in a cluster can be configured to serve exclusively as: a data,
master or "client" node. By deciding a node's role a-priori, I suspect one
should tweak its hardware (more RAM, less CPU, etc.) to fit. What would be
the (relative) recommendations per role?

data nodes (not master eligible, not serving requests):

lots of RAM for faceting and sorting

lots of CPU for indexing, analyzing and querying

master nodes (no data, not serving requests):

seems it's not doing much... lazy master nodes!

client nodes (not master eligible, no data):

lots of CPU for aggregating and serving lots of parallel requests

is RAM still important here (is it affected by faceting and sorting?)

If there are several master-only nodes, are they all idle except for one at
any given time?

Anyone have experience in deploying a cluster with "load-balancing" clients
for serving requests?

Thanks,
Philippe

plaflamme · January 6, 2012, 2:29pm

Yes, I'm aware that all nodes are equivalent by default (and they elect a
master node themselves), but by changing the default settings, you can make
a node not master eligible (node.master=false or node.client=true) and you
can decide whether a node has data (node.data).

Using node.data=false and node.client=true, you're effectively creating a
node that will not serve indices directly, but will redirect requests to
"data nodes" and aggregate the results. I'm wondering if there are any
advantages in creating such nodes. For example, does this change the
requirements on hardware (requires less RAM, no disk access, etc.) If so,
one can create a cluster topology and scale data nodes and client nodes
independently.

Maybe this only introduces additional complexity, but for cloud-based
solutions such as EC2, it may be interesting to have different types of
nodes to have greater flexibility for choosing instance types.

Thanks,
Philippe

On Fri, Jan 6, 2012 at 07:19, Karussell tableyourtime@googlemail.comwrote:

ES does not have the concept master vs. slave.

Have a look:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Peter.

On 6 Jan., 03:49, Philippe Laflamme philippe.lafla...@obiba.org
wrote:

Hi,

A node in a cluster can be configured to serve exclusively as: a data,
master or "client" node. By deciding a node's role a-priori, I suspect
one
should tweak its hardware (more RAM, less CPU, etc.) to fit. What would
be
the (relative) recommendations per role?

data nodes (not master eligible, not serving requests):

lots of RAM for faceting and sorting

lots of CPU for indexing, analyzing and querying

master nodes (no data, not serving requests):

seems it's not doing much... lazy master nodes!

client nodes (not master eligible, no data):

lots of CPU for aggregating and serving lots of parallel requests

is RAM still important here (is it affected by faceting and sorting?)

If there are several master-only nodes, are they all idle except for one
at
any given time?

Anyone have experience in deploying a cluster with "load-balancing"
clients
for serving requests?

Thanks,
Philippe

Karussell1 · January 7, 2012, 1:56pm

On 6 Jan., 15:29, Philippe Laflamme philippe.lafla...@obiba.org
wrote:

Yes, I'm aware that all nodes are equivalent by default (and they elect a
master node themselves), but by changing the default settings, you can make
a node not master eligible (node.master=false or node.client=true) and you
can decide whether a node has data (node.data).

Using node.data=false and node.client=true, you're effectively creating a
node that will not serve indices directly, but will redirect requests to
"data nodes" and aggregate the results.

why do you think that adding a separate no-data node would be
beneficial? what should be the advantages overs directing the queries
directly to a data node? As the data node needs to process the query
nevertheless.

Peter.

kimchy · January 7, 2012, 7:48pm

Having just "load balancing nodes" (non master, non data) will not help
that much. I have seen cases where it was used to run it locally with the
relevant client code for HTTP access since it was connecting over loopback
and ES had better network handling for remote access to nodes. "Just" data
nodes still serve requests, even if they are coming from client nodes.

Dedicated master nodes can become handy in certain situations. For very
large clusters they can help (i.e. 200 data nodes with 3 "eligible" master
nodes).

On Sat, Jan 7, 2012 at 3:56 PM, Karussell tableyourtime@googlemail.comwrote:

On 6 Jan., 15:29, Philippe Laflamme philippe.lafla...@obiba.org
wrote:

Yes, I'm aware that all nodes are equivalent by default (and they elect a
master node themselves), but by changing the default settings, you can
make
a node not master eligible (node.master=false or node.client=true) and
you
can decide whether a node has data (node.data).

Using node.data=false and node.client=true, you're effectively creating a
node that will not serve indices directly, but will redirect requests to
"data nodes" and aggregate the results.

why do you think that adding a separate no-data node would be
beneficial? what should be the advantages overs directing the queries
directly to a data node? As the data node needs to process the query
nevertheless.

Peter.

plaflamme · January 7, 2012, 11:30pm

why do you think that adding a separate no-data node would be
beneficial? what should be the advantages overs directing the queries
directly to a data node? As the data node needs to process the query
nevertheless.

I was wondering if putting several client-only nodes "in front" of the data
nodes would be beneficial by offloading some work from data nodes. These
nodes would act like load-balancing nodes and maybe would require different
type of hardware (less RAM, more CPU, no disk access, for example).

The data nodes still have to process the query, but they wouldn't have to
aggregate the results.

Thanks,
Philippe

plaflamme · January 7, 2012, 11:33pm

Ok, thanks for the answers.

In your example, how does having 3 "master eligible" nodes help in a 200
node cluster? Is this for avoiding split brain situations?

Thanks,
Philippe

On Sat, Jan 7, 2012 at 14:48, Shay Banon kimchy@gmail.com wrote:

Having just "load balancing nodes" (non master, non data) will not help
that much. I have seen cases where it was used to run it locally with the
relevant client code for HTTP access since it was connecting over loopback
and ES had better network handling for remote access to nodes. "Just" data
nodes still serve requests, even if they are coming from client nodes.

Dedicated master nodes can become handy in certain situations. For very
large clusters they can help (i.e. 200 data nodes with 3 "eligible" master
nodes).

On Sat, Jan 7, 2012 at 3:56 PM, Karussell tableyourtime@googlemail.comwrote:

On 6 Jan., 15:29, Philippe Laflamme philippe.lafla...@obiba.org
wrote:

Yes, I'm aware that all nodes are equivalent by default (and they elect
a
master node themselves), but by changing the default settings, you can
make
a node not master eligible (node.master=false or node.client=true) and
you
can decide whether a node has data (node.data).

Using node.data=false and node.client=true, you're effectively creating
a
node that will not serve indices directly, but will redirect requests to
"data nodes" and aggregate the results.

why do you think that adding a separate no-data node would be
beneficial? what should be the advantages overs directing the queries
directly to a data node? As the data node needs to process the query
nevertheless.

Peter.

kimchy · January 12, 2012, 10:33am

Yes, it mainly helps in avoiding split brain situations (as it can only
happen between those 3 nodes).

On Sun, Jan 8, 2012 at 1:33 AM, Philippe Laflamme <
philippe.laflamme@obiba.org> wrote:

Ok, thanks for the answers.

In your example, how does having 3 "master eligible" nodes help in a 200
node cluster? Is this for avoiding split brain situations?

Thanks,
Philippe

On Sat, Jan 7, 2012 at 14:48, Shay Banon kimchy@gmail.com wrote:

Having just "load balancing nodes" (non master, non data) will not help
that much. I have seen cases where it was used to run it locally with the
relevant client code for HTTP access since it was connecting over loopback
and ES had better network handling for remote access to nodes. "Just" data
nodes still serve requests, even if they are coming from client nodes.

Dedicated master nodes can become handy in certain situations. For very
large clusters they can help (i.e. 200 data nodes with 3 "eligible" master
nodes).

On Sat, Jan 7, 2012 at 3:56 PM, Karussell tableyourtime@googlemail.comwrote:

On 6 Jan., 15:29, Philippe Laflamme philippe.lafla...@obiba.org
wrote:

Yes, I'm aware that all nodes are equivalent by default (and they
elect a
master node themselves), but by changing the default settings, you can
make
a node not master eligible (node.master=false or node.client=true) and
you
can decide whether a node has data (node.data).

Using node.data=false and node.client=true, you're effectively
creating a
node that will not serve indices directly, but will redirect requests
to
"data nodes" and aggregate the results.

why do you think that adding a separate no-data node would be
beneficial? what should be the advantages overs directing the queries
directly to a data node? As the data node needs to process the query
nevertheless.

Peter.

Topic		Replies	Views
Server requirement for every node role Elasticsearch	4	143	December 12, 2023
Hardware requirements for client and master-only nodes Elasticsearch	5	12609	July 5, 2017
Clarification please on activities for single role nodes in a cluster Elasticsearch	3	350	July 6, 2017
Changing master/data/ingest node to be dedicated master Elasticsearch	2	769	May 16, 2017
Master and client node role clarifications Elasticsearch	6	9181	November 13, 2017

Requirements per node role

Related topics