We have an AWS Managed ES Cluster, and 4 indices in them. Our data was there in all the indices till 31st December, 2019. However, when we returned on 2nd Jan, 2020. We saw all the indices were deleted! This is really terrifying for us. I quickly did a _cat/indices with creation time, and I noticed that the index kibana_1 had creation time at 31st December, 8:40 AM GMT and one index had creation time at 31st December, 1 PM GMT. Some other indices were created after that time due to ingestion triggers from our product.
But that shouldn't be the case, since our cluster was hosted 1-1.5 months before. We never stopped the AWS ES Cluster from the point it was up. We tried to know the uptime of the cluster assuming any restart would have impacted the indices in any way but didn't find a way to see know the uptime of the cluster.
We did revert to a snapshot that we had on 31st December. But we did lose all the 2 days data. I really would like to know under what conditions something like this will happen. We are going to production with the whole product in a few weeks and assuming this can happen again is scaring me now.
Any thoughts and ideas of how this happened and how to prevent it from happening in future?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.