Different purposes MASTER/DATA/COORDINATING

hi all,
currently I have a cluster configured in this way:

SCENARIO 1:

  • 3 MASTER
  • 6 DATA

My Filebeat clients send messages directly to DATA nodes. is it correct or not?
Is it correct to think that MASTER nodes should be excluded from PUT operations?

SCENARIO 2:

  • 3 MASTER
  • 6 DATA
  • 2 COORDINATING ONLY ROLE

To unburden DATA nodes from a high usage and saturation of resources, should be ok to use the Coordinating only role? In my mind, FB clients should aim directly to them, which will reverse messages to DATA node. is it correct?

thanks

Yes, I would say that is considered best practice. Dedicated master nodes should ideally not serve reads or writes.

If they can handle the traffic you can certainly do that. Another option is to index directly to the data nodes and query through the coordinating-only nodes.

If they can handle the traffic you can certainly do that.

Can you explain me better the concept "if they can handle the traffic"? Do you mean something like "if they have enough resources to do that"?

Another option is to index directly to the data nodes and query through the coordinating-only nodes.

if the query is made using API (i.e. curl GET or something similar), do you mean to perform the PUT operations aiming to DATA nodes and the GET operations aiming to COORD nodes? if yes, my application should distinguish the two operations, right? I mean, something like:

if I'm doing a PUT I have to use the hosts --> hosts_DATA
if I'm doing a GET I have to use the hosts --> hosts_COORDINATING

Yes.

Sometimes you have different processes reading and writing, e.g. Logstash and Kibana, which makes it easy to separate. If not, everything can go through the coordinating-only nodes.

COORDINATING ROLE nodes should have the same disk space of DATA nodes? do they use disks or not?
which parameter or metric should I use to correct size them?

They use very little disk I/O so can have a quite different specification. They need good network performance and enough CPU and RAM to handle the load.

They use very little disk I/O so can have a quite different specification. They need good network performance and enough CPU and RAM to handle the load.

So, when the FB client will send the message to Coordinating Role Only, this one will just store the message in RAM and then forward it to DATA node? message will not be stored on disk?

Yes, I guess you can say that. It will act as a load balancer coordinating the request and assembling the response from the data nodes.

currently, in the indicated cluster, I have also 2 nodes used for Coordinating Only Role of Kibana.
Could I try to use them also for that? I mean, as C.O.R. for messages received from FB clients?

I would recommend testing it and to see which option that works best for you and your use-case.

thank you!

just another doubt: COR will join the cluster with all the definition roles into elasticsearch.yml setted to "false".
When my FB clients will contact the COR, how it will know which DATA nodes should be contacted?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.