Running ElasticSearch 0.16.4 in a 2 node setup (shards=3,
replication=1), last night one of the nodes displayed an "Out of
memory" error in its log.
This morning it became apparent something is terribly wrong.
Shortly after the Out of memory error, ElasticSearch wiped out about
2/3 of our index.
We tried a rolling restart, complete cluster restart, flush, refresh,
but nothing helped. The cluster comes back up with everything showing
up healthy, but 2/3 of the data is GONE.
Not sure if this is a known bug?
To recover we rsync'ed an old index backup and manually pushed all
updates from the last 24 hours.
Wondering if there is a more natural way of recovery? Right now we
are using the local-gateway. Would using s3 have helped us in such a
In our experience ElasticSearch built-in recovery does the trick in
most cases, but we did experience a handful of cases such as this
recent one, where something becomes corrupted beyond ElasticSearch
ability to automatically recover it.
We're looking to establish a best-practices procedure on how to
recover in these cases.
Would appreciate any feedback from the community as well as thoughts
about whether gateway-s3 helps in providing better recovery.
Our index size is 100gb and it changes frequently.