Here is my cluster setup (version 6.6.1):
5 master nodes:
2 of those, each is: 8GB for heap, 32-core CPU
3 of those, each is: 1GB for heap, 4 -core CPU
There are also other 5 data nodes, and 1 coordinating node.
Data from Logstash sent from 30 other servers to the coordinating node.
My question is: how does the coordinating node load balance indexing operations among the master nodes?
Does it load balance just solely and simply based on number of requests or does it take into account other parameters like system load, CPU, memory usage of the HOST system and/or the Elasticsearch instance (the node itself) ?
Is it aware of the CPU, Memory usage of the HOST system where a node sits (not just the node/the Elasticsearch instance itself) ?
As you already see that I have 3 very relatively weak master nodes. I am WORRIED that those 3 weak nodes will be "treated equally" as the other very powerful master nodes by the coordinating node, which would be very bad. I don't want the three weak master nodes to be unawarely seen to be as powerful as the other 2 by the coordinating node and get overloaded. Is that the case here?
The coordinating nodes split each bulk indexing request up into a number of per-shard indexing requests and then sends each per-shard request directly to the primary of that shard. The master node is not involved.
The master node is only involved in indexing when creating a new index, updating a mapping, or failing a shard on a node that disconnected from the cluster.
Yes, this is a concern, but not due to indexing. The master node isn't really involved in indexing, but it does have to do some work to keep the cluster balanced, create new indices, etc. There's only ever one elected master at once, but it changes occasionally due to an election. The election of a new master node completely ignores the resources available to the nodes. It's expected that any of the master-eligible nodes can be elected as the master. There are two scenarios:
The 1GB master nodes can handle the load of being the master in your cluster. Your two 8GB nodes are overprovisioned.
The 8GB master nodes can handle the load of being the master but the 1GB nodes cannot. You need more powerful master nodes.
In either case, it's better to give equal resources to each master node. This will avoid a nasty surprise if a weaker node is elected as master, which can happen without warning.
Quick question, it might be silly. What is the difference between creating indices and indexing? Let's say I have 3 dedicated, eligible master nodes and 5 data nodes. At all time, one of the 3 is elected as the master. When Logstash sends an index request, what are actually happening behind the scene? What does the master node do and what do the data nodes do? Which ones handle the actual indexing operations?
Also, when we say "shard", does a shard mean something that is defined before the indexing operation or after?
Right now I am not concerned about searching/querying.
I might have misunderstood the definition of a master node. Right now my concern is only on the indexing part, not the searching, so I am thinking having more master nodes to handle the indexing operations.
Indexing is how your documents get into the index you just created:
POST /my-index/_doc/_bulk
{"index":{}}
{"field":"value1",...}
{"index":{}}
{"field":"value2",...}
...
Creating indices involves the master but you tend not to do this very often, normally at most a few times every day. Adding documents to an index can happen thousands of times per second.
The node that receives the request from Logstash breaks the request up into a few parts and sends each part to the primary of that shard. The master is mostly uninvolved.
Before. An index is made up of shards. The getting started docs give an overview, or you can look at GET _cat/shards in your cluster to see the individual shards.
Yes, you are misunderstanding the role of a master node. The master is mostly not involved in indexing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.