hi everybody,
currently, I have in theory 2 DATA nodes, 1 active MASTER , 1 passive MASTER (I mean, could only be a master, but it is not elected) and 1 COORDINATOR (with Kibana installed on it).
I would like to know if I want to try a manual insert operation (PUT), should I aim to COORDINATOR node or to MASTER (in this case, how can I know which is the IP address)?
and if I try a GET?
Having 2 master eligible nodes in a cluster is a bad idea. You should always aim to have exactly 3 so two nodes can elect a new master should one fail or leave. I would therefore recommend that you set up 3 equally sized nodes that hold data and are master eligible. In addition to this you can use a coordinating only node with Kibana.
With this setup, you would direct indexing operations at the master/data nodes.
If having 3 data nodes is not an option, I would recommend setting up the cluster with 2 master/data nodes and one smaller dedicated master node. In this configuration you would send all indexing request to one of the master/data nodes. In addition to this you can use a coordinating only node with Kibana.
in this moment I have just 2 Master nodes for a test. anyway, I can try this configuration: I have 3 Master nodes that are also Data nodes. I have also 1 Coordinating only node (with Kibana installed on it).
Should I send my PUT and GET requests to Master node or to my Coordinating node?
I would send indexing requests directly to the data nodes in such a setup. Kibana would go via the coordinating node. Other queries can hit either - it all depends on volumes and how powerful your coordinating node is.
so, technically both Master and Data nodes can be aimed to send my requests (PUT or GET is not important)?
if I can use Data node for this purpose, why should I aim to Master?
I know that Filebeat and LS can use hosts keyword to send messages to an array of servers (if the first one is not alive, will be contacted the other ones), but if I have a third application, create by mine, where should I send my PUT and GET requests? should I use a virtual appliance for load balancing?
If you have dedicated master nodes, you should not send requests to these. If you however have master/data nodes (both roles) these can serve requests.
ok, understood: I should use Master because it has this specific purpose
and if I'm implementing my application to send requests, which has the possibility to aim all the different Master nodes of my infrastructure, should I forecast also the mechanism of probing which server is my elected Master?
I mean, what will happen if I try to send a PUT/GET request to a Master node that is currently not elected? Will my request fail?
If you have master+data nodes, send to them without caring which one is currently the master (as this can change), but avoid sending data to any dedicated master nodes (if you have any). The node that is currently elected master manages the state of the cluster - it is involved in normal requests (at least not because it happens to be the master).
You can, but generally you should not. The whole point of having dedicated master nodes is to ensure they are not overloaded so that they can concentrate on managing the cluster. This is why you should not put load on thyem by sending requests to them.
ok, so I should send messages to my DATA node.
but, if my cluster will be stopped because could not be possible to have the minimum number of Master node, also Data nodes will refuse new messages automatically?
If the cluster is not able to elect a master, write requests will be rejected. This is the correct and expected behaviour as accepting writes without an elected master could cause data loss. This is why you ideally have 3 master eligible nodes, so that you can handle a node failing and still be able to elect a master.
no, just to start I should have:
2 DATA nodes where I will send my messages
3 MASTER nodes that will manage cluster (and anything else?)
1 Coordinating only role (with Kibana installed that will aim localhost:9200)
That will work fine, but if you only have 2 data nodes, having 3 dedicated master nodes may be overkill. I would recommend starting with 3 nodes that hold data and are all master eligible plus a coordinating only node for Kibana.
latest question: when, using Kibana, I send a request of query and Coordinating role will handle it, this request will be also managed by Master elected node or directly by DATA?
so Coordinating role, since knows the status of all nodes of the cluster, will contact directly the best possible Data node to get the result of the query?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.