Sizing master-only nodes

We are planning to shift from an architecture where all nodes are node.data=true and node.master=true to a system with dedicated data and master nodes. Is there any rule of thumb for sizing the master nodes? What I can find online suggests "lightweight" (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-node.html) or "small", but I haven't found any indication of whether this is small in processing or memory, and what this size is small relative to.

Our data consists of many indices, most of which are quite small, with a single shard and a single replica, currently running on 4 nodes, each with 4 CPUs and 34 GB memory.
"number_of_nodes": 4,
"number_of_data_nodes": 4,
"active_primary_shards": 5927,
"active_shards": 11854,

I'd personally go no lower than 4GB of heap.
However you can also increase the heap to 75% of system for a master node as you don't need to worry about any file system caching.

So you can easily go for 3 x 6GB nodes.

1 Like

Agreed with Mark. You will need to ensure you take care of availability in ur new configuration. If u go with one dedicated master and if that leaves cluster / restarts / crashes / etc. ur cluster will not be available. If you select 2 master there is a concern of split-brain. Also the more the number of shards, the more the size of your cluster state be.

We are planning to use 3 master-only nodes for our cluster + 4 data nodes spread across two availability zones in the same region on EC2.

Is there a recommended topology of how to place master-only nodes across the two AZs?

Can we live with instance-store based storage for master-only node or we should go for durable EBS backed nodes?

Any best-practices / reference will be appreciated.

@warkolm is there any rationale behind the 4 GB?

what operations does a master node do for it to require 4 GB?
Other than cluster state becoming huge, there isn't anything that requires more memory than the basic requirement for running JVM I suppose?

Thanks.

That's just field experience really.

Exactly.
And if you give it 3GB of the 4GB you should have more than enough room. As you allude, if you run into problems with that much heap on a master only node then you have other issues.

Thanks @warkolm

The reason I asked is because I have been using a cluster of t2.micros for master nodes for some time now, with 500 MB heap.

My cluster state sits at about couple of MB.

So, I was wondering if there is a risk involved in using such smaller machines.

I'd be cautious about the lack of CPU resources, large mapping updates or other things may cause your cluster to become unstable.