Exclude master node from query user operations - how?

We have a 11 node Elasticsearch cluster (3 master-eligible node + 8 data nodes). We perform all operations (ingesting, search etc) using REST calls. The master nodes are lightweight as compared to data nodes.

For the sake of configuration simplicity, we specify only the master nodes as the connection string in our client code, like so:
master-01:9200,master-02:9200,master-03:9200

We do also have sniff_at_start=True setting in our clients so entire cluster topology is discovered by clients.

I realized the master node will continue to work as coordinating nodes, handle search and execute the reduce phase of search. Is there a way to configure these nodes to be ONLY master nodes and not participate in searches?
My concern is, when executing large queries the results from data nodes could overwhelm the master node during reduce phase.

Any suggestion/input will be helpful.

Don't point your clients to them is a good starting point.

What client are you using?

Thanks for the response.
If not point to master, point to where then?
Reason i ask, we certainly don't want to include all 8 data nodes in the initial connection string. So, it seems we should point to some of the data nodes and pick some out of 8. Is there no better way than this?
We may add/remove data nodes in the future and don't want to keep adjusting app configuration because of that.

Since we are still on 5.5 - we are using GitHub - elastic/elasticsearch-py-async: Backend for elasticsearch-py based on python's asyncio module.

What you want is a Coordinating Only Node, as described in this part of the documentation.

Every node is implicitly a coordinating node. This means that a node that has all three node.master , node.data and node.ingest set to false will only act as a coordinating node, which cannot be disabled.

Thanks for pointing to the docs. I am aware of coordinating-only nodes. It implies that we will need more than one such node for high availability.

It seems these nodes should have decent amount of memory to handle bulk & reduce phase of search and for that reason will cost quiet a bit to run.

IMO, our cluster is not large enough to warrant the cost of running a dedicated node for this and not a cost effective solution.

Without using coordinating only nodes, not pointing to the masters, and not configuring all your data nodes as they could change, another option would be to use a load balancer, like HAProxy or Nginx in front of your data nodes, so you would configure only this endpoint in your client and configure your data nodes on the load balancer.

1 Like

This makes a lot of sense @leandrojmp -- thanks!!!
Didn't think of reverse proxy.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.