Hardware requirements for client and master-only nodes

Hi,

There is an excellent question asked about two years ago that was never
properly
answered: https://groups.google.com/forum/#!topic/elasticsearch/dxjpMd4vNXQ

I have the exact same question. I've got a cluster with a lot of data nodes
plus two nodes that act as master + client nodes (no data).

For now I'm using those two nodes for both master (shard/cluster
management) tasks and client tasks (query handling).

I've seen a big performance gain when querying the client nodes, compared
to querying my very busy data nodes directly.

But I'd still like to get your view on the hardware requirements of the
master/client nodes. Is RAM important for serving the query results, or is
most RAM-heavy tasks performed by the data nodes? And similarly, is CPU
important on the client nodes?

Thanks,
Lasse

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35d4a8c8-755c-4f7b-80ef-eab9e0f85d08%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You should really use 3 master nodes if you have a lot of data nodes,
having 3 makes getting a quorum a lot easier.

I've previously run master nodes with 2 vcpus, 8GB RAM (4 heap) and 40 odd
data nodes, with sporadic querying and had no issues at all. Ultimately it
depends on your use case, but if you are having gains using your current
setup, then it makes sense to increase the hardware capabilities of what
you have and compare this to the previous setup, then make a call.

On 18 November 2014 23:11, Lasse Schou lasseschou@gmail.com wrote:

Hi,

There is an excellent question asked about two years ago that was never
properly answered:
Redirecting to Google Groups

I have the exact same question. I've got a cluster with a lot of data
nodes plus two nodes that act as master + client nodes (no data).

For now I'm using those two nodes for both master (shard/cluster
management) tasks and client tasks (query handling).

I've seen a big performance gain when querying the client nodes, compared
to querying my very busy data nodes directly.

But I'd still like to get your view on the hardware requirements of the
master/client nodes. Is RAM important for serving the query results, or is
most RAM-heavy tasks performed by the data nodes? And similarly, is CPU
important on the client nodes?

Thanks,
Lasse

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35d4a8c8-755c-4f7b-80ef-eab9e0f85d08%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35d4a8c8-755c-4f7b-80ef-eab9e0f85d08%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmVq7mesOC0NkpX1sbrnB0VJguo4G3WtWBWbUPSsna_xw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks!

And yes I do actually have 3 master nodes!

I was hoping to learn more about the requirements of the client nodes (for
querying only). What work is actually performed by them? Simply querying
the data nodes and merging the results, or is more heavy-weight
in-memory aggregation and sorting done of those nodes that need RAM and CPU
power?

Den tirsdag den 18. november 2014 skrev Mark Walkom markwalkom@gmail.com:

You should really use 3 master nodes if you have a lot of data nodes,
having 3 makes getting a quorum a lot easier.

I've previously run master nodes with 2 vcpus, 8GB RAM (4 heap) and 40 odd
data nodes, with sporadic querying and had no issues at all. Ultimately it
depends on your use case, but if you are having gains using your current
setup, then it makes sense to increase the hardware capabilities of what
you have and compare this to the previous setup, then make a call.

On 18 November 2014 23:11, Lasse Schou <lasseschou@gmail.com
<javascript:_e(%7B%7D,'cvml','lasseschou@gmail.com');>> wrote:

Hi,

There is an excellent question asked about two years ago that was never
properly answered:
Redirecting to Google Groups

I have the exact same question. I've got a cluster with a lot of data
nodes plus two nodes that act as master + client nodes (no data).

For now I'm using those two nodes for both master (shard/cluster
management) tasks and client tasks (query handling).

I've seen a big performance gain when querying the client nodes, compared
to querying my very busy data nodes directly.

But I'd still like to get your view on the hardware requirements of the
master/client nodes. Is RAM important for serving the query results, or is
most RAM-heavy tasks performed by the data nodes? And similarly, is CPU
important on the client nodes?

Thanks,
Lasse

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35d4a8c8-755c-4f7b-80ef-eab9e0f85d08%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35d4a8c8-755c-4f7b-80ef-eab9e0f85d08%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0mYiJMAblwU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmVq7mesOC0NkpX1sbrnB0VJguo4G3WtWBWbUPSsna_xw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmVq7mesOC0NkpX1sbrnB0VJguo4G3WtWBWbUPSsna_xw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADERWXpG%3DZD_NWmk2STJmZUVA_fCeiY1Va%2BWm-50Gh7GXbCqUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

As kimchy answered, the real problem is that solitary nodes which hold no
shards and are not master do not help much.

The only motivations would be

  • move out HTTP connection management, e.g. when clients are slow and
    appear in masses. The hardware requirement are low as long as the network
    bandwidth is ok and there are lot of sockets/ file descriptors available.

  • running spare nodes instead of HTTP load balancing in nginx e.g. (nginx
    is better in doing this)

When clients demand huge data sets from such nodes, there might be some
load on them regarding result aggregation but that is not a real problem in
comparison to the heavy duty nodes that hold the shards. The real load is
where the shards are.

Jörg

On Tue, Nov 18, 2014 at 9:59 PM, Lasse Schou lasseschou@gmail.com wrote:

Thanks!

And yes I do actually have 3 master nodes!

I was hoping to learn more about the requirements of the client nodes (for
querying only). What work is actually performed by them? Simply querying
the data nodes and merging the results, or is more heavy-weight
in-memory aggregation and sorting done of those nodes that need RAM and CPU
power?

Den tirsdag den 18. november 2014 skrev Mark Walkom <markwalkom@gmail.com

:

You should really use 3 master nodes if you have a lot of data nodes,
having 3 makes getting a quorum a lot easier.

I've previously run master nodes with 2 vcpus, 8GB RAM (4 heap) and 40
odd data nodes, with sporadic querying and had no issues at all. Ultimately
it depends on your use case, but if you are having gains using your current
setup, then it makes sense to increase the hardware capabilities of what
you have and compare this to the previous setup, then make a call.

On 18 November 2014 23:11, Lasse Schou lasseschou@gmail.com wrote:

Hi,

There is an excellent question asked about two years ago that was never
properly answered:
Redirecting to Google Groups

I have the exact same question. I've got a cluster with a lot of data
nodes plus two nodes that act as master + client nodes (no data).

For now I'm using those two nodes for both master (shard/cluster
management) tasks and client tasks (query handling).

I've seen a big performance gain when querying the client nodes,
compared to querying my very busy data nodes directly.

But I'd still like to get your view on the hardware requirements of the
master/client nodes. Is RAM important for serving the query results, or is
most RAM-heavy tasks performed by the data nodes? And similarly, is CPU
important on the client nodes?

Thanks,
Lasse

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35d4a8c8-755c-4f7b-80ef-eab9e0f85d08%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35d4a8c8-755c-4f7b-80ef-eab9e0f85d08%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0mYiJMAblwU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmVq7mesOC0NkpX1sbrnB0VJguo4G3WtWBWbUPSsna_xw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmVq7mesOC0NkpX1sbrnB0VJguo4G3WtWBWbUPSsna_xw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADERWXpG%3DZD_NWmk2STJmZUVA_fCeiY1Va%2BWm-50Gh7GXbCqUw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CADERWXpG%3DZD_NWmk2STJmZUVA_fCeiY1Va%2BWm-50Gh7GXbCqUw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoECjKoKF015j24-JEv%2B9qtXanu_MZiOgi-mE7dkU5YeRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Please can you answer what is the memory and cpu requirement for client nodes and what is the JVM size for client node?

Suppose I am having 64 GB of RAM and 4 logical CPU. Should i reserve 32 GBV for JVM on clinet ndoe and rest for the lucense use it?

--Mohan