Perma-Unallocated primary shards after a node has left the cluster


#1

Hi guys, I would really appreciate some help understanding what's going
down with shard allocation in this case:

Elasticsearch version: 1.4.4

We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of
everything). 1 node went down and the cluster went red. It started to
reallocate shards as expected and there were originally ~50 unallocated
shards with 15 primary and the rest replicas.

It's been a few hours now and there are still 15 outstanding shards that
are all primary that don't seem to be getting re-allocated. I thought this
would be a pretty standard scenario so I was really hoping I wouldn't need
to manually walk through and re-allocate the primary shards, but I'm not
sure what else to try at this point to get back to green. Any pointers
would be really appreciated. Here is some of the relevant seeming bits
folks asked about on the IRC:

In the ES logs for the unallocated index names there are lines along the
line of
[2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis]
[webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P],
s[STARTED]: failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91]
org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul
Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]]
"Jean-Paul Beaubier" is the node that went down

_cat/recovery
shards disk.used disk.avail disk.total disk.percent host ip
node
420 21.2gb 77gb 98.3gb 21 ip-10-234-164-148
10.234.164.148 Agent Axis
420 41gb 57.2gb 98.3gb 41 ip-10-218-145-237
10.218.145.237 Ebon Seeker
15
UNASSIGNED

I'm trying to understand why it's stuck in this state given there is no
other info in the logs as far as I can tell about why the shards can't be
allocated. Shouldn't the replicas just be promoted in place to new
primaries and then new replicas created on the other node?

Thanks and regards -- Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9adda07d-88b0-4fa2-805b-37d4739d6f1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


#2

Probably super evident but the output above was actually from
_cat/allocation?v not /recovery, sorry about that.

On Wednesday, April 29, 2015 at 5:19:08 PM UTC-7, Alex Schokking wrote:

Hi guys, I would really appreciate some help understanding what's going
down with shard allocation in this case:

Elasticsearch version: 1.4.4

We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of
everything). 1 node went down and the cluster went red. It started to
reallocate shards as expected and there were originally ~50 unallocated
shards with 15 primary and the rest replicas.

It's been a few hours now and there are still 15 outstanding shards that
are all primary that don't seem to be getting re-allocated. I thought this
would be a pretty standard scenario so I was really hoping I wouldn't need
to manually walk through and re-allocate the primary shards, but I'm not
sure what else to try at this point to get back to green. Any pointers
would be really appreciated. Here is some of the relevant seeming bits
folks asked about on the IRC:

In the ES logs for the unallocated index names there are lines along the
line of
[2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis]
[webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P],
s[STARTED]: failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91]
org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul
Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]]
"Jean-Paul Beaubier" is the node that went down

_cat/recovery
shards disk.used disk.avail disk.total disk.percent host ip
node
420 21.2gb 77gb 98.3gb 21 ip-10-234-164-148
10.234.164.148 Agent Axis
420 41gb 57.2gb 98.3gb 41 ip-10-218-145-237
10.218.145.237 Ebon Seeker
15
UNASSIGNED

I'm trying to understand why it's stuck in this state given there is no
other info in the logs as far as I can tell about why the shards can't be
allocated. Shouldn't the replicas just be promoted in place to new
primaries and then new replicas created on the other node?

Thanks and regards -- Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44f2f680-0560-448f-a19f-893fda5aab41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3