Earlier this week we discovered that our three node elasticsearch
cluster needed to be expanded as it was getting dangerously close to
maximum capacity. I was nervous about this and read up the best I
could on best practices to doing this. The only information I seemed
to be able to find is to ensure that the new nodes cannot be elected
as masters when they join to avoid a split brain scenario. Fair
enough.
I launched two new EC2 instances to join the cluster and watched. Some
shards began relocating, no big deal. Six hours later I checked in and
some shards were still locating, one shard was recovering. Weird but
whatever... the cluster health is still green and searches are working
fine. Then I got an alert at 2:30am that the cluster state is now
yellow and find that we have 3 shards marked as recovering and 2
shards that unassigned. The cluster still technically works but 24
hours later after the new nodes were added I feel like my only choice
to get a green cluster again will be to simply launch 5 fresh nodes
and replay all the data from backups into it. Ugggggh.
SERIOUSLY! What can I do to prevent this? I feel like I am missing
something because I always heard the strength of elasticsearch is its
ease of scaling out but it feels like every time I try it falls to the
floor.
Thanks!
James
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJreXKD3Wuyiq5XxGdSWyj3a%3DM2Xd5GQxZ9J3EywoT-OP52qFA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.