Can a dedicated coordinator replace a load balancer such as haproxy?
Hi @anon43570709 Welcome to the community.
Great question.
A coordinator only role node (assuming you direct all your queries to it) will coordinate all queries / searches i.e. act as the scatter / gather coordinator and reach out to any / all the data nodes required to fulfill the query results.
So in that sense it acts a bit like a load balancer but it is certainly not the equivalent of haproxy.
Now if we take that one step further if you are in production then that single coordinator would become a single point of failure so we would recommend running at least 2 coordinators... and then... you could run a load balancer in front of those coordinators as well.... and / or some of the client libraries and kibana have the abilities to set multiple endpoints so you would not need that.
So in short there are a couple options.
BTW an ingest roles is kinda the mirror... indexing (writing data) passes through the ingest node, any ingest pipelines are run there and then it coordinates the "writing" of the documents into the data nodes.
Hopes this helps.
That does help. Thank you! Right now I have 15 nodes behind my load balancer. I was thinking about configuring 2 of them to be dedicated coordinators to see if there was any performance difference.
They may... coordinators can be good for performance when you have complex queries or lots of queries This isolates The scatter gather phase summary aggregations off the data nodes themselves.
If you are doing heavy ingest pipelines I would recommend ingest nodes... Take that load off the data nodes.
That makes sense. I'll add that to the list of scenarios to test out. We're doing roughly 15 million searches against our cluster each day. Right now we have 12 physical d/i nodes behind haproxy and 3 dedicated masters as VMs in our testing environment. We have a total of 28 blades split between 2 chassis in the same rack for our cluster. One chassis' blades have 192gb ram and 48 cores while the other chassis' blades are 128gb ram and 28 cores. We just finished our first index over the weekend so we can start testing through some of these scenarios.
Hello,
For the same need for performance gain on data nodes, does going through dedicated coordinator nodes for indexing (in addition to queries) allow the data nodes to be unloaded? (without the use of ingestion pipelines)
As mentioned above
Coordinators for query
Ingest node for indexing.
Many combine those 2 roles and ingest nodes can help depending on the use case.. especially if you have ingest pipelines.
Thank you,
Is there any documentation to help us build the dedicated coordination nodes for queries in terms of machine resource ? (I couldn't find anything about it)
What are the resource requirements of the machines that carry these nodes?
I imagine a lot of cpu and ram but no disk?
We recommend general purpose computing like an MD5 if you're familiar with AWS profiles with a 4:1 RAM to CPU
Example 8GB of RAM and 2 CPU Is a good place to start. How many in actual size depend heavily on your use case.
Choice of disk is not really important for coordinators and ingest nodes.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.