Any reason not to use ALL data nodes as coordinator nodes for queries/indexing?

Mike_Snare · September 28, 2023, 3:03pm

We have 37 data nodes but our clients only use 6 of them to send requests to.

Is there any reason why the entire set of nodes couldn't be used by the clients? I know that some clusters use dedicated coordinator nodes, but I don't see a justification for that in our system given the sheer number of nodes we have since each can serve as a coordinator AND our generally low CPU usage. Is there any reason we couldn't just say that any one of the 37 nodes is a valid coordinator node to use for indexing/querying?

By using all data nodes as coordinator nodes, I see it as reducing the coordination load by 83% on each of the 6 nodes that do it now, while adding just a marginal overhead to each of the remaining nodes (if these 6 nodes can do it and still handle their data responsibilities, then the others should be able to handle 1/6 of what the current coordinators are doing). It also seems like it's more resilient. Right now if one of the 6 goes down we've lost 16.6% of our coordination ability, vs 2.7% if one of the 37 goes down.

Seems like a no brainer, which means I must be missing something.

stephenb · September 28, 2023, 4:00pm

Hi @Mike_Snare

The devil could be in the details ...

But no, from the top level, I don't think you are missing anything, assuming some homogeneity.

Many customers put a load balancer in front of the data nodes to distribute the load across the node pool, just as you describe.

We see what you describe when a user starts with "6" nodes and then grows but never re-addresses

I do not see an obvious flaw with your thinking at this point.

Christian_Dahlqvist · September 28, 2023, 4:03pm

If you have dedicated master nodes, you generally want to leave these out and not send requests to them. If you have a tiered architecture, e.g. hot-warm-cold, it may also make sense to direct requests only to certain nodes. If the cluster however is homogenous it is generally better to distribute the load as evenly as possible.

Dhineshkumar_R · September 28, 2023, 4:11pm

@Mike_Snare
Do you mind sharing the way you found out that only 6 data nodes are used as coordinator nodes?

Mike_Snare · September 28, 2023, 4:40pm

The cluster is homogeneous for now, so any data node can coordinate as well as any other.

Mike_Snare · September 28, 2023, 4:40pm

Just a matter of looking at our clients and how they specify the nodes to connect to.

Dhineshkumar_R · October 1, 2023, 12:35am

I see. Does your clients know the nodes where their data resides for a search request? Do they send search request with node_ids set in preference field?

system · October 29, 2023, 12:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Role of coordination nodes in cluster Elasticsearch	1	364	September 13, 2018
Scaling Number of Coordinator Nodes Elasticsearch	1	345	September 3, 2019
Usage of coordinator node for indexing Elasticsearch	8	14072	March 12, 2020
When to use Coordinating only node Elasticsearch	7	1432	November 29, 2022
What is the deciding factor for the number of coordinating only node in a cluster and how to route the requests? Elasticsearch	2	385	April 5, 2023

Any reason not to use ALL data nodes as coordinator nodes for queries/indexing?

Related topics