We are planning to shift from an architecture where all nodes are node.data=true and node.master=true to a system with dedicated data and master nodes. Is there any rule of thumb for sizing the master nodes? What I can find online suggests "lightweight" (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-node.html) or "small", but I haven't found any indication of whether this is small in processing or memory, and what this size is small relative to.
Our data consists of many indices, most of which are quite small, with a single shard and a single replica, currently running on 4 nodes, each with 4 CPUs and 34 GB memory.
"number_of_nodes": 4,
"number_of_data_nodes": 4,
"active_primary_shards": 5927,
"active_shards": 11854,
I'd personally go no lower than 4GB of heap.
However you can also increase the heap to 75% of system for a master node as you don't need to worry about any file system caching.
Agreed with Mark. You will need to ensure you take care of availability in ur new configuration. If u go with one dedicated master and if that leaves cluster / restarts / crashes / etc. ur cluster will not be available. If you select 2 master there is a concern of split-brain. Also the more the number of shards, the more the size of your cluster state be.
what operations does a master node do for it to require 4 GB?
Other than cluster state becoming huge, there isn't anything that requires more memory than the basic requirement for running JVM I suppose?
Exactly.
And if you give it 3GB of the 4GB you should have more than enough room. As you allude, if you run into problems with that much heap on a master only node then you have other issues.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.