Rolling restart of a cluster?


(Mike Deeks) #1

What is the proper way of performing a rolling restart of a cluster? I
currently have my stop script check for the cluster health to be green
before stopping itself. Unfortunately this doesn't appear to be working.

My setup:
ES 1.0.0
3 node cluster w/ 1 replica.

When I perform the rolling restart I see the cluster still reporting a
green state when a node is down. In theory that should be a yellow state
since some shards will be unallocated. My script output during a rolling
restart:
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
... continues as green for many more seconds...

Since it is reporting as green, the second node thinks it can stop and ends
up putting the cluster into a broken red state:
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

My stop script issues a call
to http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
Is it possible the other nodes are waiting to timeout the down node before
moving into the yellow state? I would assume the shutdown API call would
inform the other nodes that it is going down.

Appreciate any help on how to do this properly.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Petter Abrahamsson) #2

Mike,

Your script needs to check for the status of the cluster before shutting
down a node, ie if the state is yellow wait until it becomes green again
before shutting down the next node. You'll probably want do disable
allocation of shards while each node is being restarted (enable when node
comes back) in order to minimize the amount of data that needs to be
rebalanced.
Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set
in your elasticsearch.yml file.

Meta code

for node in $cluster_nodes; do
if [ $cluster_status == 'green' ]; then
cluster_disable_allocation()
shutdown_node($node)
wait_for_node_to_rejoin()
cluster_enable_allocation()
wait_for_cluster_status_green()
fi
done

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

/petter

On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mike02@gmail.com wrote:

What is the proper way of performing a rolling restart of a cluster? I
currently have my stop script check for the cluster health to be green
before stopping itself. Unfortunately this doesn't appear to be working.

My setup:
ES 1.0.0
3 node cluster w/ 1 replica.

When I perform the rolling restart I see the cluster still reporting a
green state when a node is down. In theory that should be a yellow state
since some shards will be unallocated. My script output during a rolling
restart:
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
... continues as green for many more seconds...

Since it is reporting as green, the second node thinks it can stop and
ends up putting the cluster into a broken red state:
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

My stop script issues a call to
http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
Is it possible the other nodes are waiting to timeout the down node before
moving into the yellow state? I would assume the shutdown API call would
inform the other nodes that it is going down.

Appreciate any help on how to do this properly.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALAhT_hertv4oX1Rcq71ELQUBdyq33ncktqT5%3DZn%3D0cOfkBxaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #3

I just used this to upgrade our labs environment a couple of days ago:

#!/bin/bash

export prefix=deployment-elastic0
export suffix=.eqiad.wmflabs
rm -f servers
for i in {1..4}; do
echo $prefix$i$suffix >> servers
done

cat << commands > /tmp/commands
wget
https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb
sudo dpkg -i --force-confdef --force-confold elasticsearch-1.1.0.deb
curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{
"transient" : {
"cluster.routing.allocation.enable": "primaries"
}
}'
sudo /etc/init.d/elasticsearch restart
until curl -s localhost:9200/_cluster/health?pretty; do
sleep 1
done
curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{
"transient" : {
"cluster.routing.allocation.enable": "all"
}
}'
until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/health |
grep green; do
cat /tmp/health
sleep 1
done
commands

for server in $(cat servers); do
scp /tmp/commands $server:/tmp/commands
ssh $server bash /tmp/commands
done

Production will swap wget and dpkg with apt-get update and apt-get install
elasticsearch but you get the idea.

It isn't fool proof. If it dies it doesn't know how to start where it left
off and you might have to kill it if the cluster doesn't come back like
you'd expect. It really only covers the "everything worked out as
expected" scenario. But it is nice when that happens.

Nik

On Wed, Apr 2, 2014 at 7:23 AM, Petter Abrahamsson petter@jebus.nu wrote:

Mike,

Your script needs to check for the status of the cluster before shutting
down a node, ie if the state is yellow wait until it becomes green again
before shutting down the next node. You'll probably want do disable
allocation of shards while each node is being restarted (enable when node
comes back) in order to minimize the amount of data that needs to be
rebalanced.
Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set
in your elasticsearch.yml file.

Meta code

for node in $cluster_nodes; do
if [ $cluster_status == 'green' ]; then
cluster_disable_allocation()
shutdown_node($node)
wait_for_node_to_rejoin()
cluster_enable_allocation()
wait_for_cluster_status_green()
fi
done

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

/petter

On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mike02@gmail.com wrote:

What is the proper way of performing a rolling restart of a cluster? I
currently have my stop script check for the cluster health to be green
before stopping itself. Unfortunately this doesn't appear to be working.

My setup:
ES 1.0.0
3 node cluster w/ 1 replica.

When I perform the rolling restart I see the cluster still reporting a
green state when a node is down. In theory that should be a yellow state
since some shards will be unallocated. My script output during a rolling
restart:
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
... continues as green for many more seconds...

Since it is reporting as green, the second node thinks it can stop and
ends up putting the cluster into a broken red state:
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

My stop script issues a call to
http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
Is it possible the other nodes are waiting to timeout the down node before
moving into the yellow state? I would assume the shutdown API call would
inform the other nodes that it is going down.

Appreciate any help on how to do this properly.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALAhT_hertv4oX1Rcq71ELQUBdyq33ncktqT5%3DZn%3D0cOfkBxaA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALAhT_hertv4oX1Rcq71ELQUBdyq33ncktqT5%3DZn%3D0cOfkBxaA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1oV7cERqdnatMV-7CZuywu9jeZ-LdKBQ%3DrsOp_oGLizA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mike Deeks) #4

That is exactly what I'm doing. For some reason the cluster reports as
green even though an entire node is down. The cluster doesn't seem to
notice the node is gone and change to yellow until many seconds later. By
then my rolling restart script has already gotten to the second node and
killed it because the cluster was still green for some reason.

On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:

Mike,

Your script needs to check for the status of the cluster before shutting
down a node, ie if the state is yellow wait until it becomes green again
before shutting down the next node. You'll probably want do disable
allocation of shards while each node is being restarted (enable when node
comes back) in order to minimize the amount of data that needs to be
rebalanced.
Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set
in your elasticsearch.yml file.

Meta code

for node in $cluster_nodes; do
if [ $cluster_status == 'green' ]; then
cluster_disable_allocation()
shutdown_node($node)
wait_for_node_to_rejoin()
cluster_enable_allocation()
wait_for_cluster_status_green()
fi
done

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

/petter

On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks <mik...@gmail.com <javascript:>

wrote:

What is the proper way of performing a rolling restart of a cluster? I
currently have my stop script check for the cluster health to be green
before stopping itself. Unfortunately this doesn't appear to be working.

My setup:
ES 1.0.0
3 node cluster w/ 1 replica.

When I perform the rolling restart I see the cluster still reporting a
green state when a node is down. In theory that should be a yellow state
since some shards will be unallocated. My script output during a rolling
restart:
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
... continues as green for many more seconds...

Since it is reporting as green, the second node thinks it can stop and
ends up putting the cluster into a broken red state:
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

My stop script issues a call to
http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
Is it possible the other nodes are waiting to timeout the down node before
moving into the yellow state? I would assume the shutdown API call would
inform the other nodes that it is going down.

Appreciate any help on how to do this properly.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #5

I'm not sure what is up but my advice is to make sure you read the cluster
state from the node you are restarting. That'll make sure it is up in the
first place and you'll get that node's view of the cluster.

Nik

On Wed, Apr 2, 2014 at 2:08 PM, Mike Deeks mike02@gmail.com wrote:

That is exactly what I'm doing. For some reason the cluster reports as
green even though an entire node is down. The cluster doesn't seem to
notice the node is gone and change to yellow until many seconds later. By
then my rolling restart script has already gotten to the second node and
killed it because the cluster was still green for some reason.

On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:

Mike,

Your script needs to check for the status of the cluster before shutting
down a node, ie if the state is yellow wait until it becomes green again
before shutting down the next node. You'll probably want do disable
allocation of shards while each node is being restarted (enable when node
comes back) in order to minimize the amount of data that needs to be
rebalanced.
Also make sure to have 'discovery.zen.minimum_master_nodes' correctly
set in your elasticsearch.yml file.

Meta code

for node in $cluster_nodes; do
if [ $cluster_status == 'green' ]; then
cluster_disable_allocation()
shutdown_node($node)
wait_for_node_to_rejoin()
cluster_enable_allocation()
wait_for_cluster_status_green()
fi
done

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/modules-cluster.html

/petter

On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote:

What is the proper way of performing a rolling restart of a cluster? I
currently have my stop script check for the cluster health to be green
before stopping itself. Unfortunately this doesn't appear to be working.

My setup:
ES 1.0.0
3 node cluster w/ 1 replica.

When I perform the rolling restart I see the cluster still reporting a
green state when a node is down. In theory that should be a yellow state
since some shards will be unallocated. My script output during a rolling
restart:
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
... continues as green for many more seconds...

Since it is reporting as green, the second node thinks it can stop and
ends up putting the cluster into a broken red state:
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

My stop script issues a call to http://localhost:9200/_
cluster/nodes/_local/_shutdown to kill the node. Is it possible the
other nodes are waiting to timeout the down node before moving into the
yellow state? I would assume the shutdown API call would inform the other
nodes that it is going down.

Appreciate any help on how to do this properly.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3UX25CQGtv_1En4NihgZo5gd04jj9fvPuAuMhC7HU%2B0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #6

My scripts do a wait for yellow before waiting for green, because as you
noticed, the cluster does not entering a yellow state immediately following
a cluster (shutdown, replica change) event.

--
Ivan

On Wed, Apr 2, 2014 at 11:08 AM, Mike Deeks mike02@gmail.com wrote:

That is exactly what I'm doing. For some reason the cluster reports as
green even though an entire node is down. The cluster doesn't seem to
notice the node is gone and change to yellow until many seconds later. By
then my rolling restart script has already gotten to the second node and
killed it because the cluster was still green for some reason.

On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:

Mike,

Your script needs to check for the status of the cluster before shutting
down a node, ie if the state is yellow wait until it becomes green again
before shutting down the next node. You'll probably want do disable
allocation of shards while each node is being restarted (enable when node
comes back) in order to minimize the amount of data that needs to be
rebalanced.
Also make sure to have 'discovery.zen.minimum_master_nodes' correctly
set in your elasticsearch.yml file.

Meta code

for node in $cluster_nodes; do
if [ $cluster_status == 'green' ]; then
cluster_disable_allocation()
shutdown_node($node)
wait_for_node_to_rejoin()
cluster_enable_allocation()
wait_for_cluster_status_green()
fi
done

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/modules-cluster.html

/petter

On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote:

What is the proper way of performing a rolling restart of a cluster? I
currently have my stop script check for the cluster health to be green
before stopping itself. Unfortunately this doesn't appear to be working.

My setup:
ES 1.0.0
3 node cluster w/ 1 replica.

When I perform the rolling restart I see the cluster still reporting a
green state when a node is down. In theory that should be a yellow state
since some shards will be unallocated. My script output during a rolling
restart:
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

curl: (52) Empty reply from server
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
... continues as green for many more seconds...

Since it is reporting as green, the second node thinks it can stop and
ends up putting the cluster into a broken red state:
curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

curl: (52) Empty reply from server
curl: (52) Empty reply from server
1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

My stop script issues a call to http://localhost:9200/_
cluster/nodes/_local/_shutdown to kill the node. Is it possible the
other nodes are waiting to timeout the down node before moving into the
yellow state? I would assume the shutdown API call would inform the other
nodes that it is going down.

Appreciate any help on how to do this properly.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBTfHruuxC4JpzNBQSNtGezXQxPvrYYTR0oMJ3YWLYfPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7