Node Type Hardware Skewing

jd1337 · November 24, 2016, 12:08am

Hi Guys,

Looking to further my foray into elasticsearch as we are pretty happy with our small scale tests.

As I will be moving into production (running in AWS on EC2) I am looking to make sure I have architected correctly. Part of my issue is that I don't really understand what the node descriptions mean for their hardware requirements - https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

For master only nodes (of a small cluster) I am looking at having 3, starting with the t2.medium version as it doesn't appear that the Masters do a great deal when using a small cluster. When I need to increase my cluster size in the future I can increase the size of the master nodes. If i was to increase the size of these node, would they be skewed more towards greater RAM, CPU or somewhere in between (C4, M4 or R3)?

For Data + Client Nodes I am looking at starting with 2 Nodes of r3.xlarge. I wouldn't imagine that I wouldn't see much of a benefit in Client only nodes at the moment.

In the future however, would a Data only Node be skewed more towards greater RAM, CPU or somewhere in between (C4, M4 or R3)?

Ditto for Client Only Node.

My workload is heavily skewed towards search / aggregations rather than data ingest.

TL;DR
Should the following node types use M4, R4 or C3 instance types in AWS
Master
Data
Client

Cheers

warkolm · November 24, 2016, 1:43am

Master nodes would likely need more memory more than anything, and only if cluster state is large. CPU speed is more a factor than number of cores, as the operations a master node does on the cluster state are single threaded.
Client nodes need memory and CPU, not disk.
Data nodes need CPU, memory and disk.

So maybe go with m4 for master, c3 for client and r4 for data. But you could probably get away with c3 for data and client.

jd1337 · November 30, 2016, 3:57am

Is the Master CPU usage constant or does it fluctuate, from the AWS Docs "T2 instances are designed to perform as if they have dedicated high speed Intel cores available when your application really needs CPU performance, while protecting you from the variable performance or other common side effects you might typically see from over-subscription in other environments.".

Is there are general Index / Shard / Size to Master RAM ration or formula?
I will have millions of documents, hundred of indexes and thousands of shards? Should I be aiming at 4GB/8GB/more ram on the Masters?

So at present I'm leaning towards (TBD based on the above questions)
Master: t2.medium x3
Client: C3.large x2
Data: r3.xlarge x2

system · December 28, 2016, 3:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.