Understanding synchronisation after downtime

hello,

i'm wondering how elasticsearch synchronized the data after downtime.

for example:

create new index:
curl -XPOST node1:9200/testindex1

now the data is in sync on both nodes:

curl -XGET node1:9200/_cat/indices?pretty
green open testindex1 5 1 0 0 970b 575b

curl -XGET node2:9200/_cat/indices?pretty
green open testindex1 5 1 0 0 970b 575b

now i shut down node2...

create a second index:
curl -XPOST node1:9200/testindex2

delete the first index:
curl -XDELETE node1:9200/testindex1

now everything ist as expected. one index on node1:

curl -XGET node1:9200/_cat/indices
yellow open testindex2 5 1 0 0 575b 575b

starting node2 again...

now after starting node2 the deleted index is back:
curl -XGET node1:9200/_cat/indices
green open testindex1 5 1 0 0 898b 503b
green open testindex2 5 1 0 0 970b 575b

how can I change the behavior?

this is fixed in 5.0 - in previous versions we import dangling indices to prevent data loss. This can lead to those situations. I think 2.x doesn't have a setting for this anymore and imports automatically.

Hi martin,

Elasticsearch automatically imports indices that it finds on disk and are not part of the cluster state. This was introduce to protect users that bring their cluster down, spin up new master nodes (thinking that their data is safe on their data nodes) and now have the cluster state empty. Another option is to have a new master node added to the cluster and since the cluster is not properly configured, it will be elected to master and use it's own cluster state for the cluster, resulting in an empty cluster state again. Since index deletion is implemented by removing the index from the cluster state, the nodes will interpret that as a "delete all operations" and caused all data to be gone. It is of course poor practice to throw away the data folder of any node (master or not) but people did and the results are disastrous. Instead the data nodes notify the master they have data and the master reimports it.

This behavior has the down side that you discovered, namely that if an index is deleted while a node is offline it will be reimported when the node comes back. The good news here is that we recently introduces the notion of an index tombstone to deal with exactly this. See https://github.com/elastic/elasticsearch/pull/17265

Cheers,
Boaz

thanks guys for the explanation.

looking forward to 5.0!