Index recovery - speed difference in red -> yellow vs yellow -> green

ppearcy · September 12, 2011, 8:30pm

Hey,
When I've restarted my cluster, I've observed that I very quickly (a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the latter,
but wanted to confirm.

Thanks,
Paul

kimchy · September 12, 2011, 10:54pm

When a replica is allocated, it needs to sync its index files with the
primary shard. While indexing, the state of the index files can diverge
(though not the content!), and they might require resync. If you would have
restarted the cluster right away after the other restart, then it would have
recovered much faster. Thanks to the way lucene works when it comes to
merging, for large segments, this is a "one time" cost.

On Mon, Sep 12, 2011 at 11:30 PM, ppearcy ppearcy@gmail.com wrote:

Hey,
When I've restarted my cluster, I've observed that I very quickly (a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the latter,
but wanted to confirm.

Thanks,
Paul

ppearcy · September 28, 2011, 5:27am

Ah, that's interesting, didn't realize that the underlying index files
didn't stay in sync, but makes sense. I did a restart of a smaller
cluster which took ~20 mins to get to green and then another restart
which took ~10 seconds to get to green.

I wonder, is it possible to inform the cluster on a full restart that
no new content has arrived which would force the file sync to get
skipped? One problem would be any new content that arrives while still
in the yellow state would get out of sync. Would probably want to
disable indexing new content until green.

Thanks,
Paul

On Sep 12, 4:54 pm, Shay Banon kim...@gmail.com wrote:

When a replica is allocated, it needs to sync its index files with the
primary shard. While indexing, the state of the index files can diverge
(though not the content!), and they might require resync. If you would have
restarted the cluster right away after the other restart, then it would have
recovered much faster. Thanks to the way lucene works when it comes to
merging, for large segments, this is a "one time" cost.

On Mon, Sep 12, 2011 at 11:30 PM, ppearcy ppea...@gmail.com wrote:

Hey,
When I've restarted my cluster, I've observed that I very quickly (a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the latter,
but wanted to confirm.

Thanks,
Paul

kimchy · September 28, 2011, 11:58am

New content that arrives while the cluster is in yellow state does not mean
it will get out of sync, it will be properly applied to the replica shards
that are currently recovering.

On Wed, Sep 28, 2011 at 8:27 AM, ppearcy ppearcy@gmail.com wrote:

Ah, that's interesting, didn't realize that the underlying index files
didn't stay in sync, but makes sense. I did a restart of a smaller
cluster which took ~20 mins to get to green and then another restart
which took ~10 seconds to get to green.

I wonder, is it possible to inform the cluster on a full restart that
no new content has arrived which would force the file sync to get
skipped? One problem would be any new content that arrives while still
in the yellow state would get out of sync. Would probably want to
disable indexing new content until green.

Thanks,
Paul

On Sep 12, 4:54 pm, Shay Banon kim...@gmail.com wrote:

When a replica is allocated, it needs to sync its index files with the
primary shard. While indexing, the state of the index files can diverge
(though not the content!), and they might require resync. If you would
have
restarted the cluster right away after the other restart, then it would
have
recovered much faster. Thanks to the way lucene works when it comes to
merging, for large segments, this is a "one time" cost.

On Mon, Sep 12, 2011 at 11:30 PM, ppearcy ppea...@gmail.com wrote:

Hey,
When I've restarted my cluster, I've observed that I very quickly (a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the latter,
but wanted to confirm.

Thanks,
Paul

Curtis_Caravone · September 28, 2011, 12:24pm

Ok, what happens in this scenario:

Cluster starts up and reaches yellow state
Sync starts happening from primary shards to replicas
Index operations come in
Node with a primary shard goes down before sync is complete

In this case, will one of the replicas be promoted to primary with an
uncertain state? I'm just trying to understand what the risks are of
operating under a yellow state instead of waiting for green.

thanks,

Curtis

On Wed, Sep 28, 2011 at 4:58 AM, Shay Banon kimchy@gmail.com wrote:

New content that arrives while the cluster is in yellow state does not mean
it will get out of sync, it will be properly applied to the replica shards
that are currently recovering.

On Wed, Sep 28, 2011 at 8:27 AM, ppearcy ppearcy@gmail.com wrote:

Ah, that's interesting, didn't realize that the underlying index files
didn't stay in sync, but makes sense. I did a restart of a smaller
cluster which took ~20 mins to get to green and then another restart
which took ~10 seconds to get to green.

I wonder, is it possible to inform the cluster on a full restart that
no new content has arrived which would force the file sync to get
skipped? One problem would be any new content that arrives while still
in the yellow state would get out of sync. Would probably want to
disable indexing new content until green.

Thanks,
Paul

On Sep 12, 4:54 pm, Shay Banon kim...@gmail.com wrote:

When a replica is allocated, it needs to sync its index files with the
primary shard. While indexing, the state of the index files can diverge
(though not the content!), and they might require resync. If you would
have
restarted the cluster right away after the other restart, then it would
have
recovered much faster. Thanks to the way lucene works when it comes to
merging, for large segments, this is a "one time" cost.

On Mon, Sep 12, 2011 at 11:30 PM, ppearcy ppea...@gmail.com wrote:

Hey,
When I've restarted my cluster, I've observed that I very quickly (a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the latter,
but wanted to confirm.

Thanks,
Paul

kimchy · September 28, 2011, 12:34pm

In this case, if the replica shard is in a state that it can be promoted to
a primary, it will be, otherwise, the shard won't be available until you
bring back the node that held the primary. (this is all local gateway logic,
shared gateway is different).

That applies if you have 1 replica with the default write consistency. If
you increase the replica count to 2, then an index won't happen unless it
has also been replicated to a quorum of shards (with 2 replicas, that means
it has been replicated at least once). you can also set the write
consistency to a different value.

That said, I am working (mainly experimenting now) on reducing the recovery
time of a replica. trying to build a bigger feature than just this, mainly
ideas currently bouncing around...

On Wed, Sep 28, 2011 at 3:24 PM, Curtis Caravone caravone@gmail.com wrote:

Ok, what happens in this scenario:

Cluster starts up and reaches yellow state

Sync starts happening from primary shards to replicas

Index operations come in

Node with a primary shard goes down before sync is complete

In this case, will one of the replicas be promoted to primary with an
uncertain state? I'm just trying to understand what the risks are of
operating under a yellow state instead of waiting for green.

thanks,

Curtis

On Wed, Sep 28, 2011 at 4:58 AM, Shay Banon kimchy@gmail.com wrote:

New content that arrives while the cluster is in yellow state does not
mean it will get out of sync, it will be properly applied to the replica
shards that are currently recovering.

On Wed, Sep 28, 2011 at 8:27 AM, ppearcy ppearcy@gmail.com wrote:

Ah, that's interesting, didn't realize that the underlying index files
didn't stay in sync, but makes sense. I did a restart of a smaller
cluster which took ~20 mins to get to green and then another restart
which took ~10 seconds to get to green.

I wonder, is it possible to inform the cluster on a full restart that
no new content has arrived which would force the file sync to get
skipped? One problem would be any new content that arrives while still
in the yellow state would get out of sync. Would probably want to
disable indexing new content until green.

Thanks,
Paul

On Sep 12, 4:54 pm, Shay Banon kim...@gmail.com wrote:

When a replica is allocated, it needs to sync its index files with the
primary shard. While indexing, the state of the index files can diverge
(though not the content!), and they might require resync. If you would
have
restarted the cluster right away after the other restart, then it would
have
recovered much faster. Thanks to the way lucene works when it comes to
merging, for large segments, this is a "one time" cost.

On Mon, Sep 12, 2011 at 11:30 PM, ppearcy ppea...@gmail.com wrote:

Hey,
When I've restarted my cluster, I've observed that I very quickly (a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the latter,
but wanted to confirm.

Thanks,
Paul

Curtis_Caravone · September 28, 2011, 1:05pm

Cool, thanks for the info.

Curtis

On Wed, Sep 28, 2011 at 5:34 AM, Shay Banon kimchy@gmail.com wrote:

In this case, if the replica shard is in a state that it can be promoted to
a primary, it will be, otherwise, the shard won't be available until you
bring back the node that held the primary. (this is all local gateway logic,
shared gateway is different).

That applies if you have 1 replica with the default write consistency. If
you increase the replica count to 2, then an index won't happen unless it
has also been replicated to a quorum of shards (with 2 replicas, that means
it has been replicated at least once). you can also set the write
consistency to a different value.

That said, I am working (mainly experimenting now) on reducing the recovery
time of a replica. trying to build a bigger feature than just this, mainly
ideas currently bouncing around...

On Wed, Sep 28, 2011 at 3:24 PM, Curtis Caravone caravone@gmail.comwrote:

Ok, what happens in this scenario:

Cluster starts up and reaches yellow state

Sync starts happening from primary shards to replicas

Index operations come in

Node with a primary shard goes down before sync is complete

In this case, will one of the replicas be promoted to primary with an
uncertain state? I'm just trying to understand what the risks are of
operating under a yellow state instead of waiting for green.

thanks,

Curtis

On Wed, Sep 28, 2011 at 4:58 AM, Shay Banon kimchy@gmail.com wrote:

New content that arrives while the cluster is in yellow state does not
mean it will get out of sync, it will be properly applied to the replica
shards that are currently recovering.

On Wed, Sep 28, 2011 at 8:27 AM, ppearcy ppearcy@gmail.com wrote:

Ah, that's interesting, didn't realize that the underlying index files
didn't stay in sync, but makes sense. I did a restart of a smaller
cluster which took ~20 mins to get to green and then another restart
which took ~10 seconds to get to green.

I wonder, is it possible to inform the cluster on a full restart that
no new content has arrived which would force the file sync to get
skipped? One problem would be any new content that arrives while still
in the yellow state would get out of sync. Would probably want to
disable indexing new content until green.

Thanks,
Paul

On Sep 12, 4:54 pm, Shay Banon kim...@gmail.com wrote:

When a replica is allocated, it needs to sync its index files with the
primary shard. While indexing, the state of the index files can
diverge
(though not the content!), and they might require resync. If you would
have
restarted the cluster right away after the other restart, then it
would have
recovered much faster. Thanks to the way lucene works when it comes to
merging, for large segments, this is a "one time" cost.

On Mon, Sep 12, 2011 at 11:30 PM, ppearcy ppea...@gmail.com wrote:

Hey,
When I've restarted my cluster, I've observed that I very quickly
(a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the
latter,
but wanted to confirm.

Thanks,
Paul

rockbobsta · April 17, 2013, 6:10am

Hi,

Firstly thanks to everyone who built and helps with elasticsearch, it is an
amazing piece of technology!

I have a question regarding cluster restarts:
I have been working on some river plugins on a cluster with the following
setup:

4 nodes
5 indexes with 10 shards each
replication set to 2

The river plugins are the only thing modifying data on the cluster, so when
they aren't running, the data is static.
Here is the process I'm following when I need to redeploy a new version of
the river plugin:
For each node:

Delete the _river index from the cluster, in order to stop any currently
running rivers
install the new version of the river plugin
bounce the node and wait for cluster state to go green
repeat for each node.

However, waiting for the cluster to go green in each of these steps takes
about 30 minutes per node, so the whole process is quite slow.
I am wondering about a couple of options that might speed this up:
Option 1 - close the indexes before bouncing the nodes, then open them once
all nodes are bounced (as nothing will be modifying the data during the
redeploy, I figure this might make the restarts a lot faster).

Option 2 - bounce 2 nodes at a time - as replication is set to 2, I figure
we can safely have 2 nodes down and still recover fully.

BTW - I'm assuming that I need to wait for the cluster state to be green
before continuing to bounce the other cluster nodes, but if this is not
correct, maybe I can save some time in that step as well.

Any suggestions on this would be appreciated.

On Tuesday, September 13, 2011 6:30:09 AM UTC+10, ppearcy wrote:

Hey,
When I've restarted my cluster, I've observed that I very quickly (a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the latter,
but wanted to confirm.

Thanks,
Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · April 17, 2013, 3:08pm

Hi,

First of all, you should not hijacking an existing thread to ask a
question, just start your own.

You can disable allocation while performing maintenance. If you have enough
replicas (which it appears that you do), then all data should be exposed if
you bring down one node.

The setting to use is cluster.routing.allocation.disable_allocation

I find that Elasticsearch still likes to move data around after re-enabling
allocation. You might want to do a flush before disabling allocation to
clear the translog.

Cheers,

Ivan

On Tue, Apr 16, 2013 at 11:10 PM, rockbobsta bob@figjamit.com.au wrote:

Hi,

Firstly thanks to everyone who built and helps with elasticsearch, it is
an amazing piece of technology!

I have a question regarding cluster restarts:
I have been working on some river plugins on a cluster with the following
setup:

4 nodes

5 indexes with 10 shards each

replication set to 2

The river plugins are the only thing modifying data on the cluster, so
when they aren't running, the data is static.
Here is the process I'm following when I need to redeploy a new version of
the river plugin:
For each node:

Delete the _river index from the cluster, in order to stop any
currently running rivers

install the new version of the river plugin

bounce the node and wait for cluster state to go green

repeat for each node.

However, waiting for the cluster to go green in each of these steps takes
about 30 minutes per node, so the whole process is quite slow.
I am wondering about a couple of options that might speed this up:
Option 1 - close the indexes before bouncing the nodes, then open them
once all nodes are bounced (as nothing will be modifying the data during
the redeploy, I figure this might make the restarts a lot faster).

Option 2 - bounce 2 nodes at a time - as replication is set to 2, I figure
we can safely have 2 nodes down and still recover fully.

BTW - I'm assuming that I need to wait for the cluster state to be green
before continuing to bounce the other cluster nodes, but if this is not
correct, maybe I can save some time in that step as well.

Any suggestions on this would be appreciated.

On Tuesday, September 13, 2011 6:30:09 AM UTC+10, ppearcy wrote:

Hey,
When I've restarted my cluster, I've observed that I very quickly (a
couple of minutes) get into the yellow state, while it takes much
longer (a couple of hours) to get into the green state.

I am using the local gateway and know that each node will pull it's
local data in order to get into the yellow state.

After that, do nodes use their own data to fulfill all replicas and
verify it against the master w/ checksums or is all the data synced
over from the master shard and the local data is disregarded?

Based on the performance I have observed, I believe it is the latter,
but wanted to confirm.

Thanks,
Paul

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Indexing when the state is YELLOW Elasticsearch	2	400	July 6, 2017
Indices recovering after a red - yellow state leads to writes stucked? Elasticsearch	2	311	December 28, 2021
Understanding recovery: primary vs replica performance difference Elasticsearch	4	3956	September 28, 2018
Why does it take time for an Elasticsearch node to go "green" after being restarted? Elasticsearch	1	632	July 6, 2017
New indices remain yellow when there are relocations in the cluster Elasticsearch	4	1255	February 13, 2020

Index recovery - speed difference in red -> yellow vs yellow -> green

Related topics