How does a master node divide querry request between two nodes with the same dataset

L0s · August 16, 2016, 3:16pm

Hey !

I was working on ES, with 3 Nodes, One master and Two data one.

The usual way to work with this is to cut our data in 2 sets that you distribute between node A and node B, then when the master node is querried, he will querry node A and node B and will return their response.

Now I was wondering if you set up node A and node B with the same dataset.
When you querry the master node, how he will responde ?

-He will : querry A and B and return the result as before ( if so, he return two times the same querry or result aren't duplicated? I think the same answer is more coherent)

-Can he : act as a load balancer between the two cluster. If two querry are requested in the same times, he will querry node A for the first, and node B for the second ?

-Can he : split the query into two and then aggregate the answer ? (not likely, but it's worth asking)

Thanks for your time and your answer !

eperry · August 16, 2016, 6:31pm

The master node does not divide the query , it issues the request to both nodes in the cluster and each node response back to the master.

I think you have the wrong impressing on a Elasticsearch cluster. Data is split between all Data Nodes in a cluster and then sharded with replication for redundancy

So if you have 3 nodes in the cluster 1 master and 2 data nodes then your data exists on both server and the master issues, receives and merges the results (And caches)

warkolm · August 16, 2016, 8:41pm

You should also read https://www.elastic.co/guide/en/elasticsearch/guide/2.x/important-configuration-changes.html#_minimum_master_nodes

L0s · August 17, 2016, 8:37am

Hey thanks for your answer !

Well, it is what I said in "he will" do that. So I get that it works like this.

In a case where I want to be sure that data are available :
You have 2 data node, which contain data, if one of your data become unavailable (network problem, w/e...)
Then when you master query data, you will miss a dataset ?
In this case, shouldn't you have a replica with the same data in case a node go down ?
This is essentially in this aspect that I m asking this question.

If I want to be sure to have my data always available, I must have another node with the same data in case it goes down ?
I m not sure if I m clear in my explanation, don't hesitate to reformulate or to make me explain again !

(Until now I was working with 3 nodes, one with E(Master)LK and 2 data node with different data now when one data node goes down for one or two days I want to be able to reach my data anyway and I don't see another way than setting another data node with the same dataset or create a new node when it goes down to replace it)

Thanks again for your answer !

Christian_Dahlqvist · August 17, 2016, 8:42am

In order to ensure high availability, don't run with a single master eligible node. Make sure your 2 data nodes also are master eligible (so that you have a total of 3 master eligible nodes in the cluster) and set minimum_master_nodes to 2 in order to avoid split brain scenarios.

If you want you data to be highly available you need to set replicas to 1 on all indices, which means that both data nodes will hold the full data set. If you then lose one of the data nodes you still have access to all data and 2 master nodes that can form a quorum.

If you do have a dedicated master node in the cluster, do not send all requests through this. Make sure to configure your clients so that requests can be sent directly to both data nodes.

L0s · August 17, 2016, 9:00am

Thank you !
You just answered to all my questions and doubt !

In this case we are talking about an ES node configured as "client node" right ?
With master and data set to 0.

Christian_Dahlqvist · August 17, 2016, 9:11am

No, make sure your application and other clients, e.g. Logstash, is able to connect to all nodes. I have not assumed the use of any client nodes here.

L0s · August 17, 2016, 9:18am

Thanks for the precision !