We have a 11 node Elasticsearch cluster (3 master-eligible node + 8 data nodes). We perform all operations (ingesting, search etc) using REST calls. The master nodes are lightweight as compared to data nodes.
For the sake of configuration simplicity, we specify only the master nodes as the connection string in our client code, like so: master-01:9200,master-02:9200,master-03:9200
We do also have sniff_at_start=True setting in our clients so entire cluster topology is discovered by clients.
I realized the master node will continue to work as coordinating nodes, handle search and execute the reduce phase of search. Is there a way to configure these nodes to be ONLY master nodes and not participate in searches?
My concern is, when executing large queries the results from data nodes could overwhelm the master node during reduce phase.
Thanks for the response.
If not point to master, point to where then?
Reason i ask, we certainly don't want to include all 8 data nodes in the initial connection string. So, it seems we should point to some of the data nodes and pick some out of 8. Is there no better way than this?
We may add/remove data nodes in the future and don't want to keep adjusting app configuration because of that.
Every node is implicitly a coordinating node. This means that a node that has all three node.master , node.data and node.ingest set to false will only act as a coordinating node, which cannot be disabled.
Without using coordinating only nodes, not pointing to the masters, and not configuring all your data nodes as they could change, another option would be to use a load balancer, like HAProxy or Nginx in front of your data nodes, so you would configure only this endpoint in your client and configure your data nodes on the load balancer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.