Where is data stored in ES

amnon_d · August 21, 2018, 7:28am

Hi there,
Appologies if this question will sound stupid....
I am testing some high availability concepts and I am puzzled about one thing.
For my test I am running an ES cluster with one master, two data nodes and one client.
I send data to data node A, and I can view it with Grafana running on any of the 4 nodes - no problem.
I am still able to view the data even when I stop ES on Data node A.
Well, I thought that maybe the Master decided to store the data on Data Node B...
So, I restarted ES on Node A and stopped it on Node B.
I was still able to view the data (must say to my satisfaction cause I wanted High Availability of my data)
My question is, does the Master directs all data nodes to store all data or am I missing something here.
Thanks.

warkolm · August 21, 2018, 7:43am

By default, Elasticsearch will create a replica of each index. And because you have multiple nodes, it can put the shards across these multiple nodes, providing redundancy.

The master nodes does handle where all the shards are allocated.

Bernt_Rostad · August 21, 2018, 7:55am

When you use the default index settings, new indices are created with 1 replica for each primary shard. Since a replica shard will always live on a different node than the primary shard, you automatically get redundancy since each of your two nodes will then have a full copy of the data. The master knew this so when you queried the cluster, after taking down one of the data nodes, it still found the relevant data on the other node.

However, if you had deleted the replica shards or changed the default index settings to not create replicas you would not have had a full copy of the data and thus no redundancy. So, keeping the 1 replica setting is a good choice in most situations.

amnon_d · August 21, 2018, 7:58am

Thank you. I was hoping this is the case but I wasn't sure.

Christian_Dahlqvist · August 21, 2018, 8:00am

If you are looking for high availability, you should ensure you have at least 3 master-eligible nodes and set minimum_master_nodes correctly.

amnon_d · August 21, 2018, 8:04am

Thank you. So by default I get 1 replica for each primary shard, can I have more than 1 replica providing I have enough Data Nodes?

Bernt_Rostad · August 21, 2018, 8:11am

Yes, there is no limitation on the number of replica shards you can have for an index. And its a dynamic setting, so you can change it for existing indices when the need arises. See the Update Indices Settings.

Adding more replicas is a nice way to scale a system for more searches since more copies of the data means you can add more data nodes that can be queried in parallel (since they all have a copy of the same data).

amnon_d · August 21, 2018, 8:13am

Thank you!

system · September 18, 2018, 8:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding nodes Elasticsearch	3	397	July 6, 2017
Replication of cluster data Elasticsearch	2	272	May 20, 2019
3 node ES cluster...one node only holds replicas Elasticsearch	10	2108	July 5, 2017
Is it possible to search on replica node ONLY? Elasticsearch	7	4951	July 5, 2017
How does ES stores the data? Elasticsearch	4	224	May 31, 2022

Where is data stored in ES

Related topics