Unassigned replica shards, and an unused node


(James Bardin) #1

Hi, I have a production cluster (0.19.8) with a number of replica shards
that are unassigned. All primary shards are accounted for, but some no
longer have redundancy (cluster health is yellow). There is also a node
with no shards whatsoever, but seems to be perfectly fine otherwise, and
removing that then rejoining it does nothing other than show as removed and
added in the master's log. I aslo tried the "reroute" api, but that seems
to be a noop on my version, as it just returns 200 and nothing happens.

Without replicas, I'm reluctant to simply restart other nodes just to see
what happens.

Is using the shutdown api (is it in .19.8?) supposed to force the shards to
other nodes, or does it simply stop the jvm?

Any other tips on how to procede?

Thanks.

--


(Radu Gheorghe) #2

Hello James,

On Wed, Oct 31, 2012 at 3:53 PM, James Bardin j.bardin@gmail.com wrote:

Hi, I have a production cluster (0.19.8) with a number of replica shards
that are unassigned. All primary shards are accounted for, but some no
longer have redundancy (cluster health is yellow). There is also a node with
no shards whatsoever, but seems to be perfectly fine otherwise, and removing
that then rejoining it does nothing other than show as removed and added in
the master's log. I aslo tried the "reroute" api, but that seems to be a
noop on my version, as it just returns 200 and nothing happens.

Without replicas, I'm reluctant to simply restart other nodes just to see
what happens.

Is using the shutdown api (is it in .19.8?) supposed to force the shards to
other nodes, or does it simply stop the jvm?

The Shutdown API (available in 0.19.8) doesn't relocate the shards
before stopping the JVM. But Elasticsearch should automatically
redistribute replicas to your other nodes so that everything should be
OK eventually. Take a look here:
http://elasticsearch-users.115913.n3.nabble.com/how-do-I-relocate-shards-from-a-node-prior-to-shutting-it-down-td4024570.html

Any other tips on how to procede?

Any clues in the logs of the empty node? If not, I would turn on
debugging and see it it brings out any new info.

Also, what is the state of your unallocated shards? Are they
initializing or simply "not allocated"?

Do you have any shard allocation settings defined?
http://www.elasticsearch.org/guide/reference/index-modules/allocation.html

I'd also try disabling replicas and enabling them again using the
Indices Update Settings API:
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

I know, it sounds a lot like "did you try turning it off and on
again?", but who knows :slight_smile:

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--


(James Bardin) #3

On Thu, Nov 1, 2012 at 3:45 AM, Radu Gheorghe
radu.gheorghe@sematext.com wrote:

The Shutdown API (available in 0.19.8) doesn't relocate the shards
before stopping the JVM. But Elasticsearch should automatically
redistribute replicas to your other nodes so that everything should be
OK eventually. Take a look here:
http://elasticsearch-users.115913.n3.nabble.com/how-do-I-relocate-shards-from-a-node-prior-to-shutting-it-down-td4024570.html

Ah, I hadn't seen that with the transient setting before. Wish I had.

Any clues in the logs of the empty node? If not, I would turn on
debugging and see it it brings out any new info.

Nothing in the logs that I can see. That node did pick up some shards
after restarting nodes which had redundant data though, so there's not
much else I can investigate there.

Also, what is the state of your unallocated shards? Are they
initializing or simply "not allocated"?

Yeah, I have one index now where all replicas are simply "UNASSIGNED".

Do you have any shard allocation settings defined?
http://www.elasticsearch.org/guide/reference/index-modules/allocation.html

Yes, it there are some routing.allocation settings on the index
missing its replicas, from when we had to push some indexes around a
while back. It's my hunch that this is related, as it's now the only
difference between indexes in the cluster. It now has an include.name
and include.tag settings. There are no more node tags, so that can't
match anything now, and I wonder if it's overriding the include.name.
I really want to just remove these settings now, but I haven't found
any way to do so without building a new index.

I'd also try disabling replicas and enabling them again using the
Indices Update Settings API:
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

Tried that, but new replicas just show up immediately as UNASSIGNED,
and nothing happens.

I know, it sounds a lot like "did you try turning it off and on
again?", but who knows :slight_smile:

May have to resort to that. I could get in a point-release update too.

Thanks,
-james

--


(James Bardin) #4

The Shutdown API (available in 0.19.8) doesn't relocate the shards

before stopping the JVM. But Elasticsearch should automatically
redistribute replicas to your other nodes so that everything should be
OK eventually. Take a look here:
http://elasticsearch-users.115913.n3.nabble.com/how-do-I-relocate-shards-from-a-node-prior-to-shutting-it-down-td4024570.html

Excluding all allocations didn't work for the index in question. All
other shard moved accordingly though. Nothing in the logs.

I know, it sounds a lot like "did you try turning it off and on
again?", but who knows :slight_smile:

May have to resort to that. I could get in a point-release update too.

A rolling restart of the service did nothing for this index. It
totally disappeared during recovery, and then only the primaries came
back online. I'll plan on the 0.19.11 update asap, as I see there a
some code changes around index allocation.

--


(Igor Motov) #5

Try setting include.name and include.tag to "*" for this index

On Thursday, November 1, 2012 2:38:38 PM UTC-4, James Bardin wrote:

The Shutdown API (available in 0.19.8) doesn't relocate the shards

before stopping the JVM. But Elasticsearch should automatically
redistribute replicas to your other nodes so that everything should be
OK eventually. Take a look here:

http://elasticsearch-users.115913.n3.nabble.com/how-do-I-relocate-shards-from-a-node-prior-to-shutting-it-down-td4024570.html

Excluding all allocations didn't work for the index in question. All
other shard moved accordingly though. Nothing in the logs.

I know, it sounds a lot like "did you try turning it off and on
again?", but who knows :slight_smile:

May have to resort to that. I could get in a point-release update too.

A rolling restart of the service did nothing for this index. It
totally disappeared during recovery, and then only the primaries came
back online. I'll plan on the 0.19.11 update asap, as I see there a
some code changes around index allocation.

--


(James Bardin) #6

On Thu, Nov 1, 2012 at 4:58 PM, Igor Motov imotov@gmail.com wrote:

Try setting include.name and include.tag to "*" for this index

Tried that too with no results. I wonder if include.tag takes precedence.

I noticed that @kimchy recently changed the routing allocation code to
treat an empty string as not set -- hoping that alleviates the
problem. It would seem there has to be some way to remove the i.r.a
settings on this index (maybe the java api? I haven't checked it out
at all yet).

Thanks,
-james

--


(Shairon Toledo) #7

It happened with us some times, the only way that I could to force shards
allocation was by _id(ids separated by comma), like this

curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "cluster.routing.allocation.include._id" :

"YxZ92dZCTE2QTUHMrf9s5Q,eBSlwQ2QRNu4hs1mcjWGSQ,XelXrOfRTqmZiP1H_9X6uw",
"cluster.routing.allocation.cluster_concurrent_rebalance": 10
}
}'

Or using by index

curl -XPUT location:9200/index_name/_settings -d '{
"index.routing.allocation.include._id" : "eBSlwQ2QRNu4hs1mcjWGSQ"
}'

The problem is, in case any node restart I need perform the PUT again.

On Thu, Nov 1, 2012 at 7:09 PM, James Bardin j.bardin@gmail.com wrote:

On Thu, Nov 1, 2012 at 4:58 PM, Igor Motov imotov@gmail.com wrote:

Try setting include.name and include.tag to "*" for this index

Tried that too with no results. I wonder if include.tag takes precedence.

I noticed that @kimchy recently changed the routing allocation code to
treat an empty string as not set -- hoping that alleviates the
problem. It would seem there has to be some way to remove the i.r.a
settings on this index (maybe the java api? I haven't checked it out
at all yet).

Thanks,
-james

--

--

Shairon Toledo
http://hashcode.me

--


(James Bardin) #8

On Thu, Nov 1, 2012 at 5:18 PM, Shairon Toledo shairon.toledo@gmail.com wrote:

curl -XPUT location:9200/index_name/_settings -d '{
"index.routing.allocation.include._id" : "eBSlwQ2QRNu4hs1mcjWGSQ"
}'

OK, using _id at the cluster level did nothing, but at the index
level, shards started relocating, and the replicas all started!

Not a permanent fix, but it seems there is a bug in the routing
allocation. I really want to get those settings out of this index.

--


(James Bardin) #9

So 0.19.11 has mitigated the problem for us. I'm guessing issue #2229
[1] is what really helped, in that empty strings are now ignored. New
replicas are allocated as expected.

  1. https://github.com/elasticsearch/elasticsearch/issues/2229

--


(Ivan Brusic) #10

I missed this thread while I was debugging the same issues:
https://groups.google.com/d/msg/elasticsearch/6aGAUDtNtWw/48RZW9YRZ1QJ

No allocation settings are working for me. Cluster is unusable as-is since
shards can no longer be allocated. The whole point of my tests was to do a
rolling upgrade, but now it appears that a full cluster restart is needed.
A bit extreme.

--
Ivan

On Fri, Nov 2, 2012 at 10:33 AM, James Bardin j.bardin@gmail.com wrote:

So 0.19.11 has mitigated the problem for us. I'm guessing issue #2229
[1] is what really helped, in that empty strings are now ignored. New
replicas are allocated as expected.

  1. https://github.com/elasticsearch/elasticsearch/issues/2229

--

--


(system) #11