Master refusing to "see" indexes -- any way to coax it to acknowledge the index?


(Jeff Vier) #1

My cluster is 2 master nodes, 10 data nodes (all in EC2)

We had a data node crash, which triggered a full cluster restart.
Everything recovered to green within a few minutes.

However, something apparently went awry (not sure what, exactly) and the
master node is...unaware of existing indices on the data nodes.

It very much looks like legitimate index files at a glance.
Random example:
/opt/elasticsearch/data/elasticsearch/nodes/0/indices/logstash-2014.01.13/0/index/
on an arbitrary data node has 693M of data in 171 files. The master node
doesn't have a
/opt/elasticsearch/data/elasticsearch/nodes/0/indices/logstash-2014.01.13

I tried shutting down everything, moving /opt/elasticsearch/data out of the
way on the masters (then mkdir'ing an empty one) and restarting the
cluster. It seems to "recognize" some random subset of the indices, but
stops well short of complete.
I also hoped that perhaps an upgrade from 0.90.10 to 0.90.11 would have
smarter recovery, but it does not.

Any ideas?

-jv

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bcaee7f8-f3f3-4048-9c43-87d8ef04e91b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jeff Vier) #2

Aaaand a downgrade to 0.90.10 fixed it.

This was after multiple restarts on 0.90.11, so it wasn't some temporary
wedging.

I am baffled.

So, I still have a couple questions:

  1. Regardless, how do I coax the masters to "see" an index I know is there.
  2. Should I do anything different when I upgrade? (or, what I'm actually
    thinking, can I upgrade?)

On Wed, Feb 5, 2014 at 12:02 AM, Jeff Vier jeff@jeffvier.com wrote:

My cluster is 2 master nodes, 10 data nodes (all in EC2)

We had a data node crash, which triggered a full cluster restart.
Everything recovered to green within a few minutes.

However, something apparently went awry (not sure what, exactly) and the
master node is...unaware of existing indices on the data nodes.

It very much looks like legitimate index files at a glance.
Random example:
/opt/elasticsearch/data/elasticsearch/nodes/0/indices/logstash-2014.01.13/0/index/
on an arbitrary data node has 693M of data in 171 files. The master node
doesn't have a
/opt/elasticsearch/data/elasticsearch/nodes/0/indices/logstash-2014.01.13

I tried shutting down everything, moving /opt/elasticsearch/data out of
the way on the masters (then mkdir'ing an empty one) and restarting the
cluster. It seems to "recognize" some random subset of the indices, but
stops well short of complete.
I also hoped that perhaps an upgrade from 0.90.10 to 0.90.11 would have
smarter recovery, but it does not.

Any ideas?

-jv

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/4-MUZ30J5Q0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/bcaee7f8-f3f3-4048-9c43-87d8ef04e91b%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANHbUjL-_Deq2nyH3RzrS5aJ0pXt9F-fkSS2MmH-_YSzXNy4TQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Zachary Tong) #3

I'm unsure what happened to your cluster, but I don't believe it is related
to the version upgrade. There should be no difference between 0.90.10 and
0.90.11 when it comes to recovery. A few thoughts:

  • You should have three dedicated masters, so that that a quorum is
    required for a node to become master. With only two dedicated masters, it
    is still possible to get a split-brain where each master thinks they are
    the "ruler" of the cluster. It's possible the inconsistency you were
    seeing in your indices is due to a split-brain - the master you were
    talking to was unaware of the index because you were in a split-brain
    arrangement.

  • Master's don't store data, so you won't see indexed data in the data
    path.

  • gateway.recover_after_data_nodes is a useful setting to have
    configured when doing full-cluster restarts. It prevents allocation while
    your cluster is coming back online, which helps speed up the restart. Not
    really related to your problem, but I wanted to mention it.

Did you see anything unusual in the logs? Perhaps references to dangling
indices? Normally if you have indices in the data path that aren't
represented by indicies in the cluster state (e.g. the data is there, but
the master does not think the index exists in the cluster state), you'll
see warnings about "dangling indices". Those warnings will quickly go away
as the "dangling" indices are re-added to the cluster state...you should
see notices in the log stating something like "no longer dangling".

My gut says you were in a split-brain and your cluster state was weird.
Make sure you have three dedicated masters, and set minimum_master_nodes
to 2.

-Zach

On Wednesday, February 5, 2014 3:28:29 AM UTC-5, Jeff Vier wrote:

Aaaand a downgrade to 0.90.10 fixed it.

This was after multiple restarts on 0.90.11, so it wasn't some temporary
wedging.

I am baffled.

So, I still have a couple questions:

  1. Regardless, how do I coax the masters to "see" an index I know is there.
  2. Should I do anything different when I upgrade? (or, what I'm actually
    thinking, can I upgrade?)

On Wed, Feb 5, 2014 at 12:02 AM, Jeff Vier <je...@jeffvier.com<javascript:>

wrote:

My cluster is 2 master nodes, 10 data nodes (all in EC2)

We had a data node crash, which triggered a full cluster restart.
Everything recovered to green within a few minutes.

However, something apparently went awry (not sure what, exactly) and the
master node is...unaware of existing indices on the data nodes.

It very much looks like legitimate index files at a glance.
Random example:
/opt/elasticsearch/data/elasticsearch/nodes/0/indices/logstash-2014.01.13/0/index/
on an arbitrary data node has 693M of data in 171 files. The master node
doesn't have a
/opt/elasticsearch/data/elasticsearch/nodes/0/indices/logstash-2014.01.13

I tried shutting down everything, moving /opt/elasticsearch/data out of
the way on the masters (then mkdir'ing an empty one) and restarting the
cluster. It seems to "recognize" some random subset of the indices, but
stops well short of complete.
I also hoped that perhaps an upgrade from 0.90.10 to 0.90.11 would have
smarter recovery, but it does not.

Any ideas?

-jv

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/4-MUZ30J5Q0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/bcaee7f8-f3f3-4048-9c43-87d8ef04e91b%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b92ce3da-a1ea-4185-9dab-4b06449d511a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jeff Vier) #4

On Wed, Feb 5, 2014 at 1:17 PM, Zachary Tong zacharyjtong@gmail.com wrote:

I'm unsure what happened to your cluster, but I don't believe it is
related to the version upgrade. There should be no difference between
0.90.10 and 0.90.11 when it comes to recovery. A few thoughts:

We assumed the same, but the observed behavior suggests otherwise, no?

  • You should have three dedicated masters, so that that a quorum is
    required for a node to become master. With only two dedicated masters, it
    is still possible to get a split-brain where each master thinks they are
    the "ruler" of the cluster. It's possible the inconsistency you were
    seeing in your indices is due to a split-brain - the master you were
    talking to was unaware of the index because you were in a split-brain
    arrangement.

I have 'discovery.zen.minimum_master_nodes: 2' set now. I assume, as
well, that this will help prevent such an occurrence. Unfortunately it
wasn't set before.

  • Master's don't store data, so you won't see indexed data in the data
    path.

I know they don't store the data, but I should still see the index
directories there with their state files (and, post-downgrade, they are,
indeed, there).

  • gateway.recover_after_data_nodes is a useful setting to have
    configured when doing full-cluster restarts. It prevents allocation while
    your cluster is coming back online, which helps speed up the restart. Not
    really related to your problem, but I wanted to mention it.

We have had that set to half our cluster node size, rounded up (using
templates in puppet, so it auto-adjusts as we add/remove nodes), since
doing the master/data node split (several months, at this point).

Did you see anything unusual in the logs? Perhaps references to dangling
indices? Normally if you have indices in the data path that aren't
represented by indicies in the cluster state (e.g. the data is there, but
the master does not think the index exists in the cluster state), you'll
see warnings about "dangling indices". Those warnings will quickly go away
as the "dangling" indices are re-added to the cluster state...you should
see notices in the log stating something like "no longer dangling".

Nothing like that, and it wasn't a matter of impatience -- it went to green
and sat there for a half hour without showing even close to the appropriate
number of active_shards in the _cluster/health.

When I tried the "master data purge" to force it to regen, just the normal
"creating index" and "update_mapping" lines showed up in the logs, but
again, stopped far short of the active_shards that were there before.

My gut says you were in a split-brain and your cluster state was weird.
Make sure you have three dedicated masters, and set minimum_master_nodes
to 2.

That's kind of my impression, too, but why the inability to recover even
after full stop/starts (some with master data dir purges, some not) then
immediate recovery after downgrade?

Also...why have 3/use 2? Is the "unused" 3rd still tracking things, or is
it just sitting there dormant? (I rather assumed it was closer to the
latter, and thus planned to just spin up another EC2 node if I had an
unrecoverable death of one of the masters)

-jv

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANHbUjLDURE-Zw-3GMYDNDcMXpUKSFgBWnH-vgANHPPWYLXbug%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Zachary Tong) #5

Yeah, I agree with your assessment of the situation...something strange
occurred. I'm unsure why the multiple cluster restarts didn't fix the
issue. Are JVM versions identical across all the nodes? If you try to
upgrade again, I would set the log level to DEBUG...maybe we can catch the
problem that way?

I'll keep thinking about it and let you know if I come across anything.

Regarding master nodes: all the master nodes will "track" the cluster
state, even if they are not the currently active master. Technically they
are dormant in the sense that they are not actively performing master
responsibilities, but they are getting updates from the master and
following along with changes to the cluster.

The reason to have 3 dedicated masters + min_master == 2 is to keep a
quorum at run-time. If you have two masters and min_master == 2, your
cluster becomes inoperable if one of the masters goes down (which is
identical to a single dedicated master...your cluster becomes inoperable).
By having three dedicated masters, you can survive a single master going
down and still maintain quorum, which means you also prevent split-brains.

On Wednesday, February 5, 2014 4:38:16 PM UTC-5, Jeff Vier wrote:

On Wed, Feb 5, 2014 at 1:17 PM, Zachary Tong <zachar...@gmail.com<javascript:>

wrote:

I'm unsure what happened to your cluster, but I don't believe it is
related to the version upgrade. There should be no difference between
0.90.10 and 0.90.11 when it comes to recovery. A few thoughts:

We assumed the same, but the observed behavior suggests otherwise, no?

  • You should have three dedicated masters, so that that a quorum is
    required for a node to become master. With only two dedicated masters, it
    is still possible to get a split-brain where each master thinks they are
    the "ruler" of the cluster. It's possible the inconsistency you were
    seeing in your indices is due to a split-brain - the master you were
    talking to was unaware of the index because you were in a split-brain
    arrangement.

I have 'discovery.zen.minimum_master_nodes: 2' set now. I assume, as
well, that this will help prevent such an occurrence. Unfortunately it
wasn't set before.

  • Master's don't store data, so you won't see indexed data in the
    data path.

I know they don't store the data, but I should still see the index
directories there with their state files (and, post-downgrade, they are,
indeed, there).

  • gateway.recover_after_data_nodes is a useful setting to have
    configured when doing full-cluster restarts. It prevents allocation while
    your cluster is coming back online, which helps speed up the restart. Not
    really related to your problem, but I wanted to mention it.

We have had that set to half our cluster node size, rounded up (using
templates in puppet, so it auto-adjusts as we add/remove nodes), since
doing the master/data node split (several months, at this point).

Did you see anything unusual in the logs? Perhaps references to dangling
indices? Normally if you have indices in the data path that aren't
represented by indicies in the cluster state (e.g. the data is there, but
the master does not think the index exists in the cluster state), you'll
see warnings about "dangling indices". Those warnings will quickly go away
as the "dangling" indices are re-added to the cluster state...you should
see notices in the log stating something like "no longer dangling".

Nothing like that, and it wasn't a matter of impatience -- it went to
green and sat there for a half hour without showing even close to the
appropriate number of active_shards in the _cluster/health.

When I tried the "master data purge" to force it to regen, just the normal
"creating index" and "update_mapping" lines showed up in the logs, but
again, stopped far short of the active_shards that were there before.

My gut says you were in a split-brain and your cluster state was weird.
Make sure you have three dedicated masters, and set minimum_master_nodes
to 2.

That's kind of my impression, too, but why the inability to recover even
after full stop/starts (some with master data dir purges, some not) then
immediate recovery after downgrade?

Also...why have 3/use 2? Is the "unused" 3rd still tracking things, or is
it just sitting there dormant? (I rather assumed it was closer to the
latter, and thus planned to just spin up another EC2 node if I had an
unrecoverable death of one of the masters)

-jv

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d34d9bc4-767d-4ea7-b4e9-bcc1aba38588%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jeff Vier) #6

Thanks. Spinning up a 3rd master as we speak.

On Wed, Feb 5, 2014 at 1:55 PM, Zachary Tong zacharyjtong@gmail.com wrote:

Yeah, I agree with your assessment of the situation...something strange
occurred. I'm unsure why the multiple cluster restarts didn't fix the
issue. Are JVM versions identical across all the nodes? If you try to
upgrade again, I would set the log level to DEBUG...maybe we can catch the
problem that way?

I'll keep thinking about it and let you know if I come across anything.

Regarding master nodes: all the master nodes will "track" the cluster
state, even if they are not the currently active master. Technically they
are dormant in the sense that they are not actively performing master
responsibilities, but they are getting updates from the master and
following along with changes to the cluster.

The reason to have 3 dedicated masters + min_master == 2 is to keep a
quorum at run-time. If you have two masters and min_master == 2, your
cluster becomes inoperable if one of the masters goes down (which is
identical to a single dedicated master...your cluster becomes inoperable).
By having three dedicated masters, you can survive a single master going
down and still maintain quorum, which means you also prevent split-brains.

On Wednesday, February 5, 2014 4:38:16 PM UTC-5, Jeff Vier wrote:

On Wed, Feb 5, 2014 at 1:17 PM, Zachary Tong zachar...@gmail.com wrote:

I'm unsure what happened to your cluster, but I don't believe it is
related to the version upgrade. There should be no difference between
0.90.10 and 0.90.11 when it comes to recovery. A few thoughts:

We assumed the same, but the observed behavior suggests otherwise, no?

  • You should have three dedicated masters, so that that a quorum is
    required for a node to become master. With only two dedicated masters, it
    is still possible to get a split-brain where each master thinks they are
    the "ruler" of the cluster. It's possible the inconsistency you were
    seeing in your indices is due to a split-brain - the master you were
    talking to was unaware of the index because you were in a split-brain
    arrangement.

I have 'discovery.zen.minimum_master_nodes: 2' set now. I assume, as
well, that this will help prevent such an occurrence. Unfortunately it
wasn't set before.

  • Master's don't store data, so you won't see indexed data in the
    data path.

I know they don't store the data, but I should still see the index
directories there with their state files (and, post-downgrade, they are,
indeed, there).

  • gateway.recover_after_data_nodes is a useful setting to have
    configured when doing full-cluster restarts. It prevents allocation while
    your cluster is coming back online, which helps speed up the restart. Not
    really related to your problem, but I wanted to mention it.

We have had that set to half our cluster node size, rounded up (using
templates in puppet, so it auto-adjusts as we add/remove nodes), since
doing the master/data node split (several months, at this point).

Did you see anything unusual in the logs? Perhaps references to
dangling indices? Normally if you have indices in the data path that
aren't represented by indicies in the cluster state (e.g. the data is
there, but the master does not think the index exists in the cluster
state), you'll see warnings about "dangling indices". Those warnings will
quickly go away as the "dangling" indices are re-added to the cluster
state...you should see notices in the log stating something like "no longer
dangling".

Nothing like that, and it wasn't a matter of impatience -- it went to
green and sat there for a half hour without showing even close to the
appropriate number of active_shards in the _cluster/health.

When I tried the "master data purge" to force it to regen, just the
normal "creating index" and "update_mapping" lines showed up in the logs,
but again, stopped far short of the active_shards that were there before.

My gut says you were in a split-brain and your cluster state was weird.
Make sure you have three dedicated masters, and set minimum_master_nodes
to 2.

That's kind of my impression, too, but why the inability to recover even
after full stop/starts (some with master data dir purges, some not) then
immediate recovery after downgrade?

Also...why have 3/use 2? Is the "unused" 3rd still tracking things, or
is it just sitting there dormant? (I rather assumed it was closer to the
latter, and thus planned to just spin up another EC2 node if I had an
unrecoverable death of one of the masters)

-jv

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/4-MUZ30J5Q0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d34d9bc4-767d-4ea7-b4e9-bcc1aba38588%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANHbUjJivfwAzQc-xaQKOPK26X9azdXRT-jTd0s%3DG98ngSFMKA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7