I am quite new to elasticsearch and i have read the documentation and the online book. However , some things are still not clear to me and unfortunately i haven't found the exact answers i am looking for.
Lets assume that we have the classic cluster setup of collaborator- master - data node.
I know that the master role is to manage the cluster state. That involves adding-removing cluster nodes and creating indexes. So, when you create an index this request must be processed by the master. That means if a request for an index creation is made to the data node, then the data node will forward the request to the master in order to handle it. The master node might in turn "send-back" the request to the data node in order to create the index; if it is determined that the index must be stored in the same data node where the initial request was made.
-
Does a client node (load balancer) will determine that since this is a index creation job it should be forwarded straight to the current master node?
-
If above question is true, does this mean that a client node will help you having less "back-and-forth" requests within the cluster?
I setup locally on my laptop this client-master-data node cluster using three different instances of elastic search. I created an index using curl and by querying each node. That is, i sent an index create request to the master node, another to the client node and one more to the data node. All three indexes were successfully created.
-
As far as i understood, if you dont send ALL your request to the client node (collaborator) you are actually neglecting the specific cluster setup. Just having a client-master-data node setup doesn't mean it serves its purpose if you send your requests to any node you wish. In order for this setup to work you MUST send all your requests to client node. Is that correct?
-
So the client node is responsible for sending the requests to the appropriate node. Requests for cluster changes and index creation go to master nodes, and searches to data node. If the client node knows where each data is stored , or in other words, when it receives a search term, it knows which nodes (shards) to ask to , why this information must be stored on all cluster nodes? Why data nodes must know where each information is stored since the whole searching job is done by the client node?
-
The way i see it, apart from the actual data located on the data nodes, any other information is located on all types of nodes within the cluster. Its just that you isolate/assign specific jobs to specific nodes despite the fact these jobs could be performed by all nodes. Is that correct?
-
Is there any information located on one node type but not to another. For example, does the master node contain information that a client node does not have (and vice versa )?
-
Am i completely confused and don't know what i am talking about?
Thanks