Last time, our elasticsearch(version 0.19) cannot work properly for both
searching and indexing.
It is because of the OOM happens in one of the elasticsearch node which
then make all other nodes cannot work properly after a long time of GC and
Unfortunately, the replication recovery does not work after restarting all
nodes. We finally need to re-build the entire indexes.
For the mentioned case, we would like to make the elasticsearch come back
and work properly in short time(not to re-build the entrie indexes because
the re-indexing takes about one day).
We have upgraded to 0.90.1 and hope we can avoid the mentioned case. Also
we are looking for more recovery methods to play safe.
After studying the backup & recovery solutions of other users, we try to
backup a snapshot and use it when the elasticsearch nodes cannot serve the
It is simply disable the flush and backup the all indexes under all nodes
But the data files are too large. Also, backup of all nodes take long time.
We cannot do the copy and delete frequently(data is not up-to-date).
I have an idea for urgent recovery in our case.
It is to create a hidden cluster which has the same config with the public
cluster. We do the same index updates to both public cluster and the hidden
When there is a need of urgent recovery, we copy the entire data from
hidden cluster to public cluster and restart the public cluster.
Also, the hidden cluster does not serve the client and it is just for index
updates. the hidden cluster should be more stable.
May I know does it make sense for me to do this? or, can we make a hidden
node(has all primary shards) like this?
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/groups/opt_out.