Understanding synchronisation after downtime

ash85 · May 30, 2016, 1:21pm

hello,

i'm wondering how elasticsearch synchronized the data after downtime.

for example:

create new index:
curl -XPOST node1:9200/testindex1

now the data is in sync on both nodes:

curl -XGET node1:9200/_cat/indices?pretty
green open testindex1 5 1 0 0 970b 575b

curl -XGET node2:9200/_cat/indices?pretty
green open testindex1 5 1 0 0 970b 575b

now i shut down node2...

create a second index:
curl -XPOST node1:9200/testindex2

delete the first index:
curl -XDELETE node1:9200/testindex1

now everything ist as expected. one index on node1:

curl -XGET node1:9200/_cat/indices
yellow open testindex2 5 1 0 0 575b 575b

starting node2 again...

now after starting node2 the deleted index is back:
curl -XGET node1:9200/_cat/indices
green open testindex1 5 1 0 0 898b 503b
green open testindex2 5 1 0 0 970b 575b

how can I change the behavior?

s1monw · May 30, 2016, 2:34pm

this is fixed in 5.0 - in previous versions we import dangling indices to prevent data loss. This can lead to those situations. I think 2.x doesn't have a setting for this anymore and imports automatically.

bleskes · May 30, 2016, 2:40pm

Hi martin,

Elasticsearch automatically imports indices that it finds on disk and are not part of the cluster state. This was introduce to protect users that bring their cluster down, spin up new master nodes (thinking that their data is safe on their data nodes) and now have the cluster state empty. Another option is to have a new master node added to the cluster and since the cluster is not properly configured, it will be elected to master and use it's own cluster state for the cluster, resulting in an empty cluster state again. Since index deletion is implemented by removing the index from the cluster state, the nodes will interpret that as a "delete all operations" and caused all data to be gone. It is of course poor practice to throw away the data folder of any node (master or not) but people did and the results are disastrous. Instead the data nodes notify the master they have data and the master reimports it.

This behavior has the down side that you discovered, namely that if an index is deleted while a node is offline it will be reimported when the node comes back. The good news here is that we recently introduces the notion of an index tombstone to deal with exactly this. See https://github.com/elastic/elasticsearch/pull/17265

Cheers,
Boaz

ash85 · May 30, 2016, 2:52pm

thanks guys for the explanation.

looking forward to 5.0!

Topic		Replies	Views
Old indices being copied between nodes on recovery Elasticsearch	3	461	November 21, 2017
How to sync the index data in a cluster Elasticsearch	1	1253	July 6, 2017
Endless creating index after delete another index Elasticsearch	5	24678	October 1, 2018
Cluster Healt Status "red", I can´t add data on some of my index Elasticsearch	4	867	July 6, 2017
Indices gone after upgrade (7.8.0 to 7.9.1) Elasticsearch	6	612	December 7, 2020

Understanding synchronisation after downtime

Related topics