What is the proper way of performing a rolling restart of a cluster? I
currently have my stop script check for the cluster health to be green
before stopping itself. Unfortunately this doesn't appear to be working.
My setup:
ES 1.0.0
3 node cluster w/ 1 replica.
When I perform the rolling restart I see the cluster still reporting a
green state when a node is down. In theory that should be a yellow state
since some shards will be unallocated. My script output during a rolling
restart:
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
curl: (52) Empty reply from server
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
curl: (52) Empty reply from server
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
... continues as green for many more seconds...
Since it is reporting as green, the second node thinks it can stop and ends
up putting the cluster into a broken red state:
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
My stop script issues a call
to http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
Is it possible the other nodes are waiting to timeout the down node before
moving into the yellow state? I would assume the shutdown API call would
inform the other nodes that it is going down.
Appreciate any help on how to do this properly.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.