hi all,
currently I have a cluster configured in this way:
SCENARIO 1:
3 MASTER
6 DATA
My Filebeat clients send messages directly to DATA nodes. is it correct or not?
Is it correct to think that MASTER nodes should be excluded from PUT operations?
SCENARIO 2:
3 MASTER
6 DATA
2 COORDINATING ONLY ROLE
To unburden DATA nodes from a high usage and saturation of resources, should be ok to use the Coordinating only role? In my mind, FB clients should aim directly to them, which will reverse messages to DATA node. is it correct?
Yes, I would say that is considered best practice. Dedicated master nodes should ideally not serve reads or writes.
If they can handle the traffic you can certainly do that. Another option is to index directly to the data nodes and query through the coordinating-only nodes.
If they can handle the traffic you can certainly do that.
Can you explain me better the concept "if they can handle the traffic"? Do you mean something like "if they have enough resources to do that"?
Another option is to index directly to the data nodes and query through the coordinating-only nodes.
if the query is made using API (i.e. curl GET or something similar), do you mean to perform the PUT operations aiming to DATA nodes and the GET operations aiming to COORD nodes? if yes, my application should distinguish the two operations, right? I mean, something like:
if I'm doing a PUT I have to use the hosts --> hosts_DATA if I'm doing a GET I have to use the hosts --> hosts_COORDINATING
Sometimes you have different processes reading and writing, e.g. Logstash and Kibana, which makes it easy to separate. If not, everything can go through the coordinating-only nodes.
COORDINATING ROLE nodes should have the same disk space of DATA nodes? do they use disks or not?
which parameter or metric should I use to correct size them?
They use very little disk I/O so can have a quite different specification. They need good network performance and enough CPU and RAM to handle the load.
They use very little disk I/O so can have a quite different specification. They need good network performance and enough CPU and RAM to handle the load.
So, when the FB client will send the message to Coordinating Role Only, this one will just store the message in RAM and then forward it to DATA node? message will not be stored on disk?
currently, in the indicated cluster, I have also 2 nodes used for Coordinating Only Role of Kibana.
Could I try to use them also for that? I mean, as C.O.R. for messages received from FB clients?
just another doubt: COR will join the cluster with all the definition roles into elasticsearch.yml setted to "false".
When my FB clients will contact the COR, how it will know which DATA nodes should be contacted?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.