Where is data stored in ES

Hi there,
Appologies if this question will sound stupid....
I am testing some high availability concepts and I am puzzled about one thing.
For my test I am running an ES cluster with one master, two data nodes and one client.
I send data to data node A, and I can view it with Grafana running on any of the 4 nodes - no problem.
I am still able to view the data even when I stop ES on Data node A.
Well, I thought that maybe the Master decided to store the data on Data Node B...
So, I restarted ES on Node A and stopped it on Node B.
I was still able to view the data (must say to my satisfaction cause I wanted High Availability of my data)
My question is, does the Master directs all data nodes to store all data or am I missing something here.
Thanks.

By default, Elasticsearch will create a replica of each index. And because you have multiple nodes, it can put the shards across these multiple nodes, providing redundancy.

The master nodes does handle where all the shards are allocated.

When you use the default index settings, new indices are created with 1 replica for each primary shard. Since a replica shard will always live on a different node than the primary shard, you automatically get redundancy since each of your two nodes will then have a full copy of the data. The master knew this so when you queried the cluster, after taking down one of the data nodes, it still found the relevant data on the other node.

However, if you had deleted the replica shards or changed the default index settings to not create replicas you would not have had a full copy of the data and thus no redundancy. So, keeping the 1 replica setting is a good choice in most situations.

Thank you. I was hoping this is the case but I wasn't sure.

If you are looking for high availability, you should ensure you have at least 3 master-eligible nodes and set minimum_master_nodes correctly.

Thank you. So by default I get 1 replica for each primary shard, can I have more than 1 replica providing I have enough Data Nodes?

Yes, there is no limitation on the number of replica shards you can have for an index. And its a dynamic setting, so you can change it for existing indices when the need arises. See the Update Indices Settings.

Adding more replicas is a nice way to scale a system for more searches since more copies of the data means you can add more data nodes that can be queried in parallel (since they all have a copy of the same data).

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.