Hi there,
I am a bit confused about cluster nodes settings. There is a lot of topics and articles about nodes / shards / replicas settings, but I often find inaccuracies between current (or my) version - articles & topics - documentation.
My version of ES is 5.4. I guess the settings applied to 5.x versions are equal.
Cluster settings
There are articles (and documentation) which suggest 3 dedicated master-eligible nodes to be created in cluster as basic configuration.
I want to configure cluster for large number of docs (10 000 000 to 20 000 000), which will be updated and searched. What configuration do you suggest me ?
3 node.master nodes + 3 node.data nodes (how many data nodes do I need in this scenario ?)
I understand that configuration is individual to requirements and I am not even talking about shards and replicas, but my question is general - what should master eligible node configuration look like for large number of docs to be processed = joined with data nodes, or separated, or ... ?
Thank you.
The difference is between creating a node that is master-eligible (node.master=true) and creating a node that is only master-eligible (node.master=true, node.data=false, and node.ingest=false). Does that help clarify the situation?
OK, so difference between master-eligible and only master-eligible is that master-eligible is able to hold data. Documentation says:
While master nodes can also behave as coordinating nodes and route search and indexing requests from clients to data nodes, it is better not to use dedicated master nodes for this purpose. It is important for the stability of the cluster that master-eligible nodes do as little work as possible.
So, let's consider cluster structure that consists of 3 master-eligible nodes and discovery.zen.minimum_master_nodes option set to value of 2 (this structure is recommended as basic configuration in many articles and topics). Does that mean these master nodes should be master-eligible or only master-eligible ?
I want to clarify configuration of master nodes and their usage for situation I described above (large data). I guess it is better to have 3 only master-eligible nodes (untouched, just for stability of cluster) and a number of data nodes (how many ?). Am I right ?
Another configuration could be 3 only master-eligible nodes, a number of data nodes and a number of coordination nodes. What is your opinion ?
Thank you.
For the master-eligible nodes to do as little work as possible you need two things:
set the master-eligible nodes to be standalone master-eligible nodes (so only master-eligible)
do not send indexing/search requests to the master nodes
Correct. For your how many nodes question, that is a difficult question to answer. It depends entirely on your workload (not just number of docs, but how often you're updating these docs, how many simultaneous search requests you expect to see, etc.). These are not easy questions to answer and they require lifting on your part. Elasticsearch is designed to scale horizontally easily, so you can start on the low side and increase if needed as you learn more about your workload and its interaction with Elasticsearch.
Coordinating nodes may or may not be necessary, again it entirely depends on your workload. It's helpful for when the data nodes are under a lot of load from sustained search and indexing requests, the coordinating nodes can take some of the query load off of the data nodes (the data nodes still execute the shard-level searches, but the coordinating node does the work of managing the client interaction and accumulating the shard-level results from the data nodes). My recommendation is this: start without them, see how your cluster performs, if it acts like a cluster that is under sustained load that you can triage as being from search and indexing requests, then add a few coordinating nodes and see if it helps.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.