Memory footprint

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO) per server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache grow? If
so by how much?

--

The memory consumption of master nodes grows only with the size of cluster
state, which means it will grow if you add new indices, shards, fields,
aliases or nodes to the cluster. It doesn't grow as you add more data
unless you run queries against these master nodes. The memory consumption
of the nodes depends primarily on the types of queries that you are
executing. So, the best thing to do is to monitor memory usage under search
load similar to production to see how much memory is typically used. Not
sure I understand the benefit of running a client only node on the same
machine with data only node.

On Thursday, November 29, 2012 4:23:14 PM UTC-5, Garth wrote:

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO) per
server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache grow?
If so by how much?

--

We have MasterData nodes now. We had a split brain and out of sync indexes.
Long story short we are moving a way from having the MD combined. I added
the Client in there so that from a Query/Index/Delete perspective, all
requests flow to the Client and not through Master. The only job for
Masters is to manage the indexes. Granted I could move Clients off to their
own VM's but I figured since they don't have a large footprint two ES's (
Client, DataOnly ) could exist on the same server.

Based on your response with respect to memory consumption on the Master, I
would presume that if all my requests are going to the client then I would
monitor the client. Would I still have to monitor the Master? Does the
client push request information to the masters?

On Thu, Nov 29, 2012 at 9:26 PM, Igor Motov imotov@gmail.com wrote:

The memory consumption of master nodes grows only with the size of cluster
state, which means it will grow if you add new indices, shards, fields,
aliases or nodes to the cluster. It doesn't grow as you add more data
unless you run queries against these master nodes. The memory consumption
of the nodes depends primarily on the types of queries that you are
executing. So, the best thing to do is to monitor memory usage under search
load similar to production to see how much memory is typically used. Not
sure I understand the benefit of running a client only node on the same
machine with data only node.

On Thursday, November 29, 2012 4:23:14 PM UTC-5, Garth wrote:

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO) per
server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache grow?
If so by how much?

--

--

I think switching to designated masters is a good move.

All nodes in the cluster communicate with the master node to know which
nodes are still alive and to publish cluster state changes. But this
doesn't happen for each request. Only requests that cause cluster
information to change (mapping updates for example) will cause such
communication. So, master load should not be significant. However, I would
still continue to monitor masters just because they play such an important
role in the cluster. Having dedicated client nodes in your model doesn't
seem to be necessary. Data nodes can do everything that clients can do. So,
you can remove clients and connect directly to data nodes.

On Friday, November 30, 2012 9:49:27 AM UTC-5, Garth wrote:

We have MasterData nodes now. We had a split brain and out of sync
indexes. Long story short we are moving a way from having the MD combined.
I added the Client in there so that from a Query/Index/Delete perspective,
all requests flow to the Client and not through Master. The only job for
Masters is to manage the indexes. Granted I could move Clients off to their
own VM's but I figured since they don't have a large footprint two ES's (
Client, DataOnly ) could exist on the same server.

Based on your response with respect to memory consumption on the Master, I
would presume that if all my requests are going to the client then I would
monitor the client. Would I still have to monitor the Master? Does the
client push request information to the masters?

On Thu, Nov 29, 2012 at 9:26 PM, Igor Motov <imo...@gmail.com<javascript:>

wrote:

The memory consumption of master nodes grows only with the size of
cluster state, which means it will grow if you add new indices, shards,
fields, aliases or nodes to the cluster. It doesn't grow as you add more
data unless you run queries against these master nodes. The memory
consumption of the nodes depends primarily on the types of queries that you
are executing. So, the best thing to do is to monitor memory usage under
search load similar to production to see how much memory is typically used.
Not sure I understand the benefit of running a client only node on the same
machine with data only node.

On Thursday, November 29, 2012 4:23:14 PM UTC-5, Garth wrote:

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO) per
server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache grow?
If so by how much?

--

--

Based on the statements below about "non data" nodes I thought it would be
best to have Clients handle the HTTP requests. Are you saying that I will
have the same performance if I drop the Client and let the DataOnly handle
the HTTP requests?

We can start a whole cluster of data nodes which do not even start an
HTTP transport
by setting http.enabled to false. Such nodes will communicate with one
another using the
transporthttp://www.elasticsearch.org/guide/reference/modules/transport.html
module.
In front of the cluster we can start one or more “non data” nodes which
will start with HTTP enabled. All HTTP communication will be performed
through these “non data” nodes.

The benefit of using that is first the ability to create smart load
balancers. These “non data” nodes are still part of the cluster, and they
redirect operations exactly to the node that holds the relevant data. The
other benefit is the fact that for scatter / gather based operations (such
as search), these nodes will take part of the processing since they will
start the scatter process, and perform the actual gather processing.
On Fri, Nov 30, 2012 at 10:12 AM, Igor Motov imotov@gmail.com wrote:

I think switching to designated masters is a good move.

All nodes in the cluster communicate with the master node to know which
nodes are still alive and to publish cluster state changes. But this
doesn't happen for each request. Only requests that cause cluster
information to change (mapping updates for example) will cause such
communication. So, master load should not be significant. However, I would
still continue to monitor masters just because they play such an important
role in the cluster. Having dedicated client nodes in your model doesn't
seem to be necessary. Data nodes can do everything that clients can do. So,
you can remove clients and connect directly to data nodes.

On Friday, November 30, 2012 9:49:27 AM UTC-5, Garth wrote:

We have MasterData nodes now. We had a split brain and out of sync
indexes. Long story short we are moving a way from having the MD combined.
I added the Client in there so that from a Query/Index/Delete perspective,
all requests flow to the Client and not through Master. The only job for
Masters is to manage the indexes. Granted I could move Clients off to their
own VM's but I figured since they don't have a large footprint two ES's (
Client, DataOnly ) could exist on the same server.

Based on your response with respect to memory consumption on the Master,
I would presume that if all my requests are going to the client then I
would monitor the client. Would I still have to monitor the Master? Does
the client push request information to the masters?

On Thu, Nov 29, 2012 at 9:26 PM, Igor Motov imo...@gmail.com wrote:

The memory consumption of master nodes grows only with the size of
cluster state, which means it will grow if you add new indices, shards,
fields, aliases or nodes to the cluster. It doesn't grow as you add more
data unless you run queries against these master nodes. The memory
consumption of the nodes depends primarily on the types of queries that you
are executing. So, the best thing to do is to monitor memory usage under
search load similar to production to see how much memory is typically used.
Not sure I understand the benefit of running a client only node on the same
machine with data only node.

On Thursday, November 29, 2012 4:23:14 PM UTC-5, Garth wrote:

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO) per
server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache
grow? If so by how much?

--

--

--

I would expect solution without client nodes to have equal of faster
performance than the solution with client nodes. The architecture described
in this snippet makes sense when you add a separate tier of client nodes in
the front of your cluster. Such architecture enables a
few interesting things. You can make set of client static and "publish"
their IPs for the use of the web tier, for example. Behind this layer of
static clients you can dynamically change your data nodes without worrying
about data node discovery, balancing, and failover - it's all handled for
you by elasticsearch. It would also allow you to move handling of HTTP
traffic and scatter/gather operations off the data nodes to client node
servers and it might also makes sense in some hardware setups. However, I
couldn't think of a useful scenario when client nodes are placed on the
same servers as data nodes, and was curious to know why you are doing it.

On Tuesday, December 4, 2012 6:20:40 PM UTC-5, Garth wrote:

Based on the statements below about "non data" nodes I thought it would be
best to have Clients handle the HTTP requests. Are you saying that I will
have the same performance if I drop the Client and let the DataOnly handle
the HTTP requests?

We can start a whole cluster of data nodes which do not even start an HTTP transport
by setting http.enabled to false. Such nodes will communicate with one
another using the transporthttp://www.elasticsearch.org/guide/reference/modules/transport.html module.
In front of the cluster we can start one or more “non data” nodes which
will start with HTTP enabled. All HTTP communication will be performed
through these “non data” nodes.

The benefit of using that is first the ability to create smart load
balancers. These “non data” nodes are still part of the cluster, and they
redirect operations exactly to the node that holds the relevant data. The
other benefit is the fact that for scatter / gather based operations (such
as search), these nodes will take part of the processing since they will
start the scatter process, and perform the actual gather processing.
On Fri, Nov 30, 2012 at 10:12 AM, Igor Motov <imo...@gmail.com<javascript:>

wrote:

I think switching to designated masters is a good move.

All nodes in the cluster communicate with the master node to know which
nodes are still alive and to publish cluster state changes. But this
doesn't happen for each request. Only requests that cause cluster
information to change (mapping updates for example) will cause such
communication. So, master load should not be significant. However, I would
still continue to monitor masters just because they play such an important
role in the cluster. Having dedicated client nodes in your model doesn't
seem to be necessary. Data nodes can do everything that clients can do. So,
you can remove clients and connect directly to data nodes.

On Friday, November 30, 2012 9:49:27 AM UTC-5, Garth wrote:

We have MasterData nodes now. We had a split brain and out of sync
indexes. Long story short we are moving a way from having the MD combined.
I added the Client in there so that from a Query/Index/Delete perspective,
all requests flow to the Client and not through Master. The only job for
Masters is to manage the indexes. Granted I could move Clients off to their
own VM's but I figured since they don't have a large footprint two ES's (
Client, DataOnly ) could exist on the same server.

Based on your response with respect to memory consumption on the Master,
I would presume that if all my requests are going to the client then I
would monitor the client. Would I still have to monitor the Master? Does
the client push request information to the masters?

On Thu, Nov 29, 2012 at 9:26 PM, Igor Motov imo...@gmail.com wrote:

The memory consumption of master nodes grows only with the size of
cluster state, which means it will grow if you add new indices, shards,
fields, aliases or nodes to the cluster. It doesn't grow as you add more
data unless you run queries against these master nodes. The memory
consumption of the nodes depends primarily on the types of queries that you
are executing. So, the best thing to do is to monitor memory usage under
search load similar to production to see how much memory is typically used.
Not sure I understand the benefit of running a client only node on the same
machine with data only node.

On Thursday, November 29, 2012 4:23:14 PM UTC-5, Garth wrote:

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO) per
server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache
grow? If so by how much?

--

--

--

I was not going to be given any extra servers. What I wanted to achieve is
the behavior you described having client nodes installed. Based on your
response then, I should move the Clients off to either VM's or pure
dedicated server. I don't see how though that a client could not do
scatter/gather whether on it's own dedicated server or on a server that
also has a data node. If it's a client should it not be able to do what a
client can do no matter where it is installed?

On Tue, Dec 4, 2012 at 6:56 PM, Igor Motov imotov@gmail.com wrote:

I would expect solution without client nodes to have equal of faster
performance than the solution with client nodes. The architecture described
in this snippet makes sense when you add a separate tier of client nodes in
the front of your cluster. Such architecture enables a
few interesting things. You can make set of client static and "publish"
their IPs for the use of the web tier, for example. Behind this layer of
static clients you can dynamically change your data nodes without worrying
about data node discovery, balancing, and failover - it's all handled for
you by elasticsearch. It would also allow you to move handling of HTTP
traffic and scatter/gather operations off the data nodes to client node
servers and it might also makes sense in some hardware setups. However, I
couldn't think of a useful scenario when client nodes are placed on the
same servers as data nodes, and was curious to know why you are doing it.

On Tuesday, December 4, 2012 6:20:40 PM UTC-5, Garth wrote:

Based on the statements below about "non data" nodes I thought it would
be best to have Clients handle the HTTP requests. Are you saying that I
will have the same performance if I drop the Client and let the DataOnly
handle the HTTP requests?

We can start a whole cluster of data nodes which do not even start an
HTTP transport by setting http.enabled to false. Such nodes will
communicate with one another using the transporthttp://www.elasticsearch.org/guide/reference/modules/transport.html module.
In front of the cluster we can start one or more “non data” nodes which
will start with HTTP enabled. All HTTP communication will be performed
through these “non data” nodes.

The benefit of using that is first the ability to create smart load
balancers. These “non data” nodes are still part of the cluster, and they
redirect operations exactly to the node that holds the relevant data. The
other benefit is the fact that for scatter / gather based operations (such
as search), these nodes will take part of the processing since they will
start the scatter process, and perform the actual gather processing.
On Fri, Nov 30, 2012 at 10:12 AM, Igor Motov imo...@gmail.com wrote:

I think switching to designated masters is a good move.

All nodes in the cluster communicate with the master node to know which
nodes are still alive and to publish cluster state changes. But this
doesn't happen for each request. Only requests that cause cluster
information to change (mapping updates for example) will cause such
communication. So, master load should not be significant. However, I would
still continue to monitor masters just because they play such an important
role in the cluster. Having dedicated client nodes in your model doesn't
seem to be necessary. Data nodes can do everything that clients can do. So,
you can remove clients and connect directly to data nodes.

On Friday, November 30, 2012 9:49:27 AM UTC-5, Garth wrote:

We have MasterData nodes now. We had a split brain and out of sync
indexes. Long story short we are moving a way from having the MD combined.
I added the Client in there so that from a Query/Index/Delete perspective,
all requests flow to the Client and not through Master. The only job for
Masters is to manage the indexes. Granted I could move Clients off to their
own VM's but I figured since they don't have a large footprint two ES's (
Client, DataOnly ) could exist on the same server.

Based on your response with respect to memory consumption on the
Master, I would presume that if all my requests are going to the client
then I would monitor the client. Would I still have to monitor the Master?
Does the client push request information to the masters?

On Thu, Nov 29, 2012 at 9:26 PM, Igor Motov imo...@gmail.com wrote:

The memory consumption of master nodes grows only with the size of
cluster state, which means it will grow if you add new indices, shards,
fields, aliases or nodes to the cluster. It doesn't grow as you add more
data unless you run queries against these master nodes. The memory
consumption of the nodes depends primarily on the types of queries that you
are executing. So, the best thing to do is to monitor memory usage under
search load similar to production to see how much memory is typically used.
Not sure I understand the benefit of running a client only node on the same
machine with data only node.

On Thursday, November 29, 2012 4:23:14 PM UTC-5, Garth wrote:

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO) per
server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache
grow? If so by how much?

--

--

--

--

You are right, clients should be able to do their work no matter where they
are installed. And I didn't say it's not going to work. I was just trying
to explain why I don't see how it's useful. Node clients are just servers
that cannot be masters and cannot store any data. So, each machine that
runs elasticsearch server already has a client built into that server and I
was just curious about what you are trying to achieve by running a client
in a separate process on the same machine.

On Wednesday, December 5, 2012 9:48:25 AM UTC-5, Garth wrote:

I was not going to be given any extra servers. What I wanted to achieve is
the behavior you described having client nodes installed. Based on your
response then, I should move the Clients off to either VM's or pure
dedicated server. I don't see how though that a client could not do
scatter/gather whether on it's own dedicated server or on a server that
also has a data node. If it's a client should it not be able to do what a
client can do no matter where it is installed?

On Tue, Dec 4, 2012 at 6:56 PM, Igor Motov <imo...@gmail.com <javascript:>

wrote:

I would expect solution without client nodes to have equal of faster
performance than the solution with client nodes. The architecture described
in this snippet makes sense when you add a separate tier of client nodes in
the front of your cluster. Such architecture enables a
few interesting things. You can make set of client static and "publish"
their IPs for the use of the web tier, for example. Behind this layer of
static clients you can dynamically change your data nodes without worrying
about data node discovery, balancing, and failover - it's all handled for
you by elasticsearch. It would also allow you to move handling of HTTP
traffic and scatter/gather operations off the data nodes to client node
servers and it might also makes sense in some hardware setups. However, I
couldn't think of a useful scenario when client nodes are placed on the
same servers as data nodes, and was curious to know why you are doing it.

On Tuesday, December 4, 2012 6:20:40 PM UTC-5, Garth wrote:

Based on the statements below about "non data" nodes I thought it would
be best to have Clients handle the HTTP requests. Are you saying that I
will have the same performance if I drop the Client and let the DataOnly
handle the HTTP requests?

We can start a whole cluster of data nodes which do not even start an
HTTP transport by setting http.enabled to false. Such nodes will
communicate with one another using the transporthttp://www.elasticsearch.org/guide/reference/modules/transport.html module.
In front of the cluster we can start one or more “non data” nodes which
will start with HTTP enabled. All HTTP communication will be performed
through these “non data” nodes.

The benefit of using that is first the ability to create smart load
balancers. These “non data” nodes are still part of the cluster, and they
redirect operations exactly to the node that holds the relevant data. The
other benefit is the fact that for scatter / gather based operations (such
as search), these nodes will take part of the processing since they will
start the scatter process, and perform the actual gather processing.
On Fri, Nov 30, 2012 at 10:12 AM, Igor Motov imo...@gmail.com wrote:

I think switching to designated masters is a good move.

All nodes in the cluster communicate with the master node to know which
nodes are still alive and to publish cluster state changes. But this
doesn't happen for each request. Only requests that cause cluster
information to change (mapping updates for example) will cause such
communication. So, master load should not be significant. However, I would
still continue to monitor masters just because they play such an important
role in the cluster. Having dedicated client nodes in your model doesn't
seem to be necessary. Data nodes can do everything that clients can do. So,
you can remove clients and connect directly to data nodes.

On Friday, November 30, 2012 9:49:27 AM UTC-5, Garth wrote:

We have MasterData nodes now. We had a split brain and out of sync
indexes. Long story short we are moving a way from having the MD combined.
I added the Client in there so that from a Query/Index/Delete perspective,
all requests flow to the Client and not through Master. The only job for
Masters is to manage the indexes. Granted I could move Clients off to their
own VM's but I figured since they don't have a large footprint two ES's (
Client, DataOnly ) could exist on the same server.

Based on your response with respect to memory consumption on the
Master, I would presume that if all my requests are going to the client
then I would monitor the client. Would I still have to monitor the Master?
Does the client push request information to the masters?

On Thu, Nov 29, 2012 at 9:26 PM, Igor Motov imo...@gmail.com wrote:

The memory consumption of master nodes grows only with the size of
cluster state, which means it will grow if you add new indices, shards,
fields, aliases or nodes to the cluster. It doesn't grow as you add more
data unless you run queries against these master nodes. The memory
consumption of the nodes depends primarily on the types of queries that you
are executing. So, the best thing to do is to monitor memory usage under
search load similar to production to see how much memory is typically used.
Not sure I understand the benefit of running a client only node on the same
machine with data only node.

On Thursday, November 29, 2012 4:23:14 PM UTC-5, Garth wrote:

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO) per
server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache
grow? If so by how much?

--

--

--

--

Thanks for the help. Sorry for being so thick!

On Thu, Dec 6, 2012 at 10:27 PM, Igor Motov imotov@gmail.com wrote:

You are right, clients should be able to do their work no matter where
they are installed. And I didn't say it's not going to work. I was just
trying to explain why I don't see how it's useful. Node clients
are just servers that cannot be masters and cannot store any data. So, each
machine that runs elasticsearch server already has a client built into that
server and I was just curious about what you are trying to achieve by
running a client in a separate process on the same machine.

On Wednesday, December 5, 2012 9:48:25 AM UTC-5, Garth wrote:

I was not going to be given any extra servers. What I wanted to achieve
is the behavior you described having client nodes installed. Based on your
response then, I should move the Clients off to either VM's or pure
dedicated server. I don't see how though that a client could not do
scatter/gather whether on it's own dedicated server or on a server that
also has a data node. If it's a client should it not be able to do what a
client can do no matter where it is installed?

On Tue, Dec 4, 2012 at 6:56 PM, Igor Motov imo...@gmail.com wrote:

I would expect solution without client nodes to have equal of faster
performance than the solution with client nodes. The architecture described
in this snippet makes sense when you add a separate tier of client nodes in
the front of your cluster. Such architecture enables a
few interesting things. You can make set of client static and "publish"
their IPs for the use of the web tier, for example. Behind this layer of
static clients you can dynamically change your data nodes without worrying
about data node discovery, balancing, and failover - it's all handled for
you by elasticsearch. It would also allow you to move handling of HTTP
traffic and scatter/gather operations off the data nodes to client node
servers and it might also makes sense in some hardware setups. However, I
couldn't think of a useful scenario when client nodes are placed on the
same servers as data nodes, and was curious to know why you are doing it.

On Tuesday, December 4, 2012 6:20:40 PM UTC-5, Garth wrote:

Based on the statements below about "non data" nodes I thought it would
be best to have Clients handle the HTTP requests. Are you saying that I
will have the same performance if I drop the Client and let the DataOnly
handle the HTTP requests?

We can start a whole cluster of data nodes which do not even start an
HTTP transport by setting http.enabled to false. Such nodes will
communicate with one another using the transporthttp://www.elasticsearch.org/guide/reference/modules/transport.html module.
In front of the cluster we can start one or more “non data” nodes which
will start with HTTP enabled. All HTTP communication will be performed
through these “non data” nodes.

The benefit of using that is first the ability to create smart load
balancers. These “non data” nodes are still part of the cluster, and they
redirect operations exactly to the node that holds the relevant data. The
other benefit is the fact that for scatter / gather based operations (such
as search), these nodes will take part of the processing since they will
start the scatter process, and perform the actual gather processing.
On Fri, Nov 30, 2012 at 10:12 AM, Igor Motov imo...@gmail.com wrote:

I think switching to designated masters is a good move.

All nodes in the cluster communicate with the master node to know
which nodes are still alive and to publish cluster state changes. But this
doesn't happen for each request. Only requests that cause cluster
information to change (mapping updates for example) will cause such
communication. So, master load should not be significant. However, I would
still continue to monitor masters just because they play such an important
role in the cluster. Having dedicated client nodes in your model doesn't
seem to be necessary. Data nodes can do everything that clients can do. So,
you can remove clients and connect directly to data nodes.

On Friday, November 30, 2012 9:49:27 AM UTC-5, Garth wrote:

We have MasterData nodes now. We had a split brain and out of sync
indexes. Long story short we are moving a way from having the MD combined.
I added the Client in there so that from a Query/Index/Delete perspective,
all requests flow to the Client and not through Master. The only job for
Masters is to manage the indexes. Granted I could move Clients off to their
own VM's but I figured since they don't have a large footprint two ES's (
Client, DataOnly ) could exist on the same server.

Based on your response with respect to memory consumption on the
Master, I would presume that if all my requests are going to the client
then I would monitor the client. Would I still have to monitor the Master?
Does the client push request information to the masters?

On Thu, Nov 29, 2012 at 9:26 PM, Igor Motov imo...@gmail.com wrote:

The memory consumption of master nodes grows only with the size of
cluster state, which means it will grow if you add new indices, shards,
fields, aliases or nodes to the cluster. It doesn't grow as you add more
data unless you run queries against these master nodes. The memory
consumption of the nodes depends primarily on the types of queries that you
are executing. So, the best thing to do is to monitor memory usage under
search load similar to production to see how much memory is typically used.
Not sure I understand the benefit of running a client only node on the same
machine with data only node.

On Thursday, November 29, 2012 4:23:14 PM UTC-5, Garth wrote:

Given the following cluster configuration:

3 Master Only machines most likely VMs.

9 Client Only nodes & 9 Data Only nodes - 2 ES instances (CO+DO)
per server.
296GB of RAM per physical server.
14 TB per DO machine.

I am trying to figure out how much RAM to allocate to each.
As more documents are indexed, does the Master and or Client cache
grow? If so by how much?

--

--

--

--

--