Exclude master node from query user operations - how?

sale · January 17, 2022, 10:04pm

We have a 11 node Elasticsearch cluster (3 master-eligible node + 8 data nodes). We perform all operations (ingesting, search etc) using REST calls. The master nodes are lightweight as compared to data nodes.

For the sake of configuration simplicity, we specify only the master nodes as the connection string in our client code, like so:
master-01:9200,master-02:9200,master-03:9200

We do also have sniff_at_start=True setting in our clients so entire cluster topology is discovered by clients.

I realized the master node will continue to work as coordinating nodes, handle search and execute the reduce phase of search. Is there a way to configure these nodes to be ONLY master nodes and not participate in searches?
My concern is, when executing large queries the results from data nodes could overwhelm the master node during reduce phase.

Any suggestion/input will be helpful.

warkolm · January 18, 2022, 3:40am

Don't point your clients to them is a good starting point.

What client are you using?

sale · January 18, 2022, 3:26pm

Thanks for the response.
If not point to master, point to where then?
Reason i ask, we certainly don't want to include all 8 data nodes in the initial connection string. So, it seems we should point to some of the data nodes and pick some out of 8. Is there no better way than this?
We may add/remove data nodes in the future and don't want to keep adjusting app configuration because of that.

Since we are still on 5.5 - we are using GitHub - elastic/elasticsearch-py-async: Backend for elasticsearch-py based on python's asyncio module.

leandrojmp · January 18, 2022, 4:22pm

What you want is a Coordinating Only Node, as described in this part of the documentation.

Every node is implicitly a coordinating node. This means that a node that has all three node.master , node.data and node.ingest set to false will only act as a coordinating node, which cannot be disabled.

sale · January 18, 2022, 5:19pm

Thanks for pointing to the docs. I am aware of coordinating-only nodes. It implies that we will need more than one such node for high availability.

It seems these nodes should have decent amount of memory to handle bulk & reduce phase of search and for that reason will cost quiet a bit to run.

IMO, our cluster is not large enough to warrant the cost of running a dedicated node for this and not a cost effective solution.

leandrojmp · January 18, 2022, 6:06pm

Without using coordinating only nodes, not pointing to the masters, and not configuring all your data nodes as they could change, another option would be to use a load balancer, like HAProxy or Nginx in front of your data nodes, so you would configure only this endpoint in your client and configure your data nodes on the load balancer.

sale · January 18, 2022, 6:25pm

This makes a lot of sense @leandrojmp -- thanks!!!
Didn't think of reverse proxy.

system · February 15, 2022, 6:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adding master nodes to Existing cluster Elasticsearch	7	3452	February 10, 2020
What other Master eligible Nodes do Elasticsearch	1	452	December 11, 2017
Elasticsearch - Change nodes to data only Elasticsearch	2	462	November 1, 2021
How to only query on data nodes by elasticsearch-py? Elasticsearch language-clients	4	581	October 20, 2020
Elasticsearch hosts upgrade - options Elasticsearch	9	212	May 29, 2023

Exclude master node from query user operations - how?

Related topics