I'm trying to determine if index replication is worth it when running ELk stack with Elasticserch 10 node cluster on Virtual machines. Virtual machines are protected by VMware HA and Data is protected by the Storage arrays on SAN. Our ELK is a non-revenue impacting hence up to 10 min timeout is acceptable.
Here is the sequence of events :
1 ) Hypervisor fails and all the VM's running on it dies
2) Elasticsearch goes RED since primary shards are down.
3) No read or write operations (from logstash) to indexes that are Red
3) All failed VM's are restarted on other hypervisors in the cluster.
4) VM's come's on online <5 mins and rejoin the ES cluster.
5) ES will flush the translog's on the failed VM's
6) Once all the primaries are online ES cluster would go Green
7) All read / write operation goes back to normal
Is the above sequence correct?
Is my understanding correct that primary shard would become green after VM has been restarted?