I have read this recommendation Coordinating only nodes can benefit large clusters by offloading the coordinating node role from data and master-eligible nodes..
What is considered a "large cluster"? Do you have a thumb rule (base on number/size of node or data it stores) for this scenario?
Welcome to our community!
It depends. A read heavy cluster may have a small amount of data on a small number of nodes, but the queries may be complex enough to use a coordinating node. On the flip side the queries may not be complex so you don't need them.
The better answer is - what is your use case? How large is your cluster? What problems are you seeing that you think these nodes would help solve?
Thanks @warkolm, I have started exploring eleasticSearch and was curious about this statement. Right now I dont have any use case as such. A general rule of thumb will be helpful to move forward. I was also wondering how Cloud SaaS (Elasticsearch elasticSearch) handle this? They dont allow customer to configure coordinator nodes at all. How would they decide if they should configure Coordinator only for a customer or node?
That is not accurate, You can configure dedicated coordinating/ ingest nodes in elasticsearch service / Elastic Cloud.
I would agree with @warkolm . There is no direct rule of thumb.
I can tell you I probably directly or help set up 20 to 30 clusters over the last year and a half and I only used dedicated coordinators/ ingest on two of them.
Unless you run into CPU bound on your data nodes on a sizeable cluster, that's the only time I would start thinking about coordinators and ingest.
You could be CPU bound from complex queries and / or ingest pipelines or other factors.
And now with a new graviton in AWS one of those use cases. We got rid of the coordinators because there's more CPU available.
So it will really come down to your use case and unless it's high volume in ingest or query or both, I doubt you would need dedicated coordinators / ingest.
Also, picking the right hardware profile is important, probably more important than adding coordinators / ingest nodes.
There is something else to consider that all traffic will flow through those coordinator. Nodes not just read both read and write will flow through them so you have to size them properly.
That's very high level about the best I can do. Rule of thumb.
The nice thing is you can always reconfigure in Elastic Cloud.
Thanks a ton @stephenb, this is really helpful!
When you create cluster of 20-30 nodes, do you put all data nodes behind the a load balancer and share LB's IP address with the client application or share all the data nodes IPs with client app.
Can there be a performance issue with LB if we put so many backend behind it. It will have to health check all of them?
There is a proxy / LB in front of the entire cluster in Elastic Cloud so you have a single endpoint.
You should spin up a trial and give it a look / test.
Got it @stephenb! Appreciate the prompt response!
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.