Lost index metadata and overwriting pre-existing index files

Danny_Berger · February 26, 2014, 1:44am

Hi - I recently experienced some surprising elasticsearch behavior and I'd
appreciate some verification on the "whys" behind what we saw. Basically,
during a cluster restart we lost some index metadata causing those indices
to not be realized and loaded from the data nodes (raw index files still
existed on disk), then, before we realized that and had a chance to recover
them, new incoming data caused the cluster to create new indices under the
same names, completely overwriting the original, raw index data on disk
(clearing out and losing a lot of data). If that's unclear or for further
details, I've posted the scenario and straightforward steps to reproduce:
https://github.com/dpb587/elasticsearch-lost-index.

These are my core questions...

Is it true that index metadata (sharding size, mapping, etc) will only
ever be stored on master-capable nodes? Previously, my understanding of the
master was that it was primarily responsible for managing cluster state and
coordinating cluster balancing, not persisting index metadata. (I'm not
arguing it doesn't necessarily make sense, just that I didn't realize
"cluster state" included the index metadata)
Is there documentation on elasticsearch.org which more precisely defines
the responsibilities of master and data nodes? The only vague references
I've come across are
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-node.html,
the elasticsearch default configuration file, and various non-authoritative
blog posts and Stack Overflow answers, none of which prompted me to realize
data nodes would not hold their own metadata.
Is it true that elasticsearch (Lucene?) will overwrite existing data
files without error or warning if the cluster is not aware of the index? If
so, is there a way to disable that behavior to avoid accidental data loss
due to misconfiguration (aside from the broad action.auto_create_index
setting)? If not, is there anything else which would explain the behavior
we saw?

Thank you for your time!

Danny

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9407e415-db8f-461d-b04f-027fda4f5c9c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

rawill · July 10, 2015, 9:19pm

Just experienced something similar. We had a VM instance fail along with an issue with our storage. On restart Elasticsearch created new indexes, but will not load up the old indexes. At this point I'm not sure how to recover from this since there doesn't appear to be away to reload the previously indexed data.

Harlin_ES · July 10, 2015, 10:15pm

The solution to this is to have backup master nodes. Running a cluster with only 1 master eligible node is asking for trouble.

Topic		Replies	Views
Possible bug in elastic causes data loss in a rare scenario Elasticsearch	5	2368	July 6, 2017
Index deleted in Cluster Elasticsearch	1	231	July 6, 2017
Am I losing data? Elasticsearch	5	337	April 27, 2022
Partial index replication causes data loss? Elasticsearch	9	532	July 6, 2017
Elasticsearch filesystem recovery? Elasticsearch	4	194	June 1, 2023

Lost index metadata and overwriting pre-existing index files

Related topics