Hi - I recently experienced some surprising elasticsearch behavior and I'd
appreciate some verification on the "whys" behind what we saw. Basically,
during a cluster restart we lost some index metadata causing those indices
to not be realized and loaded from the data nodes (raw index files still
existed on disk), then, before we realized that and had a chance to recover
them, new incoming data caused the cluster to create new indices under the
same names, completely overwriting the original, raw index data on disk
(clearing out and losing a lot of data). If that's unclear or for further
details, I've posted the scenario and straightforward steps to reproduce:
https://github.com/dpb587/elasticsearch-lost-index.
These are my core questions...
-
Is it true that index metadata (sharding size, mapping, etc) will only
ever be stored on master-capable nodes? Previously, my understanding of the
master was that it was primarily responsible for managing cluster state and
coordinating cluster balancing, not persisting index metadata. (I'm not
arguing it doesn't necessarily make sense, just that I didn't realize
"cluster state" included the index metadata) -
Is there documentation on elasticsearch.org which more precisely defines
the responsibilities of master and data nodes? The only vague references
I've come across are
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-node.html,
the elasticsearch default configuration file, and various non-authoritative
blog posts and Stack Overflow answers, none of which prompted me to realize
data nodes would not hold their own metadata. -
Is it true that elasticsearch (Lucene?) will overwrite existing data
files without error or warning if the cluster is not aware of the index? If
so, is there a way to disable that behavior to avoid accidental data loss
due to misconfiguration (aside from the broadaction.auto_create_index
setting)? If not, is there anything else which would explain the behavior
we saw?
Thank you for your time!
Danny
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9407e415-db8f-461d-b04f-027fda4f5c9c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.