Assign unassigned primary shard


(Stefan Pi-3) #1

Hi all,

we're using ES 0.19.0 with 5 nodes and around 4700 primary shards. The
index.number_of_replicas setting is set to 1, so we have 2 copies of each
shard (one primary and one non-primary) which leads to around 9400 shards
altogether.

Lately we had some network problems leading to the situation that one node
left the cluster temporarily which caused the cluster to
re-assign/re-distribute a lot of shards all over the nodes. Unfortunately
exactly 1 primary shard was left unassigned (cluster state is red now),
though its non-primary counterpart was assigned properly. My question is
now if there is any way to get ES creating the primary shard (out of the
existing non-primary shard). Or - if we need the primary shard first - is
it possible to set a non-primary shard to a primary one?

Thank you very much,
Stefan

--


(Igor Motov) #2

When elasticsearch detects that the primary shard is gone it supposed to
immediately pick one of the replicas and promote it to primary. It's
strange to hear that you have non-primary assigned while primary is not.
Could you check log file on the master node to see if there are any "failed
to execute cluster state update" exceptions there? Could you also run
_cluster/health on all nodes to see if they agree on the number of
assigned/unassigned shards?

On Monday, October 29, 2012 6:26:02 AM UTC-4, Stefan Pi wrote:

Hi all,

we're using ES 0.19.0 with 5 nodes and around 4700 primary shards. The
index.number_of_replicas setting is set to 1, so we have 2 copies of each
shard (one primary and one non-primary) which leads to around 9400 shards
altogether.

Lately we had some network problems leading to the situation that one node
left the cluster temporarily which caused the cluster to
re-assign/re-distribute a lot of shards all over the nodes. Unfortunately
exactly 1 primary shard was left unassigned (cluster state is red now),
though its non-primary counterpart was assigned properly. My question is
now if there is any way to get ES creating the primary shard (out of the
existing non-primary shard). Or - if we need the primary shard first - is
it possible to set a non-primary shard to a primary one?

Thank you very much,
Stefan

--


(Stefan Pi-3) #3

_cluster/health is exactly the same on all nodes (1 unassigned non-primary
shard, all other shards are assigned). I didn't see any "failed to execute
cluster state update". What we have are some of these:

[NodeName] [IndexName][2] failed to merge
java.lang.IndexOutOfBoundsException: Index: 116, Size: 23

and some of these:
[NodeName] [IndexName][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (187 <=
187 ) (out: org.elasticsearch.index.store.Store$StoreIndexOutput@38f21bf)

Do you think these are connected to the unassigned shard? I also wonder
where these are coming from.

Anyway, is there a way to force manually the assignment of a shard?

Thank you very much for your help,
Stefan

--


(Igor Motov) #4

I am confused, in the first post you said that you had
one unassigned primary and in the second post you are saying that it's one
unassigned replica. Which one is correct? I will assume that it's
unassigned replica, since it makes more sense.

The "failed to merge" means that your lucene index got corrupted, but I
don't see how this would cause problems with shards allocation.

Did you set any "cluster.routing.allocation." or
"index.routing.allocation.
" settings? If you did, we might need to check
these settings and clean them appropriately first. If not, the safest
solution here is to just restart the cluster. If full cluster restart is
not an option and the index with unallocated shard has a small number of
shards (ideally 1), you can try setting the number of replicas on this
index to 0 and then back to 1. It will trigger reallocation might get this
shard unstuck.

In any case, I would recommend upgrading the cluster to a more modern
version some time soon. There were significant improvements in the cluster
resiliency to these types of issues and you get much more control over
allocations.

On Tuesday, October 30, 2012 5:01:50 AM UTC-4, Stefan Pi wrote:

_cluster/health is exactly the same on all nodes (1 unassigned non-primary
shard, all other shards are assigned). I didn't see any "failed to execute
cluster state update". What we have are some of these:

[NodeName] [IndexName][2] failed to merge
java.lang.IndexOutOfBoundsException: Index: 116, Size: 23

and some of these:
[NodeName] [IndexName][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (187 <=
187 ) (out: org.elasticsearch.index.store.Store$StoreIndexOutput@38f21bf)

Do you think these are connected to the unassigned shard? I also wonder
where these are coming from.

Anyway, is there a way to force manually the assignment of a shard?

Thank you very much for your help,
Stefan

--


(Stefan Pi-3) #5

I'm sorry for the confusion. It was a typo. There's one unassigned primaryshard, all others are assigned. Yes, it makes so sense really but the
corresponding assigned shard clearly states "primary: false" when using the
cluster state API.

We did not set any "cluster.routing.allocation." or
"index.routing.allocation.
" settings. The index has 5 shards, I'm not sure
if this is small in this context.

I just checked the actual data folders of the nodes and noticed following:
There's a shard-folder for the missing shard on one node and this shard is
not listed when asking the cluster state API and there are two nodes (both
are master nodes) which have a _state/state-4 file (a binary file). Here's
a table view:

In folder ../nodes/0/indices/indexName/ there are following folders

node1: 0, 1, 2, 3
node2: 0, 2, 3, 4 <-- Folder/Shard 4 is the assigned non-primary shard
node3: _state/state-4 (Master)
node4: _state/state-4 (Master)
node5: 1, 4 (Master) <--- Folder/Shard 4 is not listed when asking the
cluster API

Maybe it's enough to restart node5 to trigger reallocation of Shard 4?

--


(Igor Motov) #6

Can you grep log files on node 5 for the name of this index to see if there
were any error messages there?

You can try restarting a node. But to be honest I don't quite understand
what's going on. If this is really the primary shard that's not assigned,
then the problem is most likely caused by some sort of issue with cluster
state propagation. And if this is the case, the master node might not even
notice disappearance of the node that you will shut down. So, just be
prepared to do full cluster restart if this will happen.

On Tuesday, October 30, 2012 10:11:08 AM UTC-4, Stefan Pi wrote:

I'm sorry for the confusion. It was a typo. There's one unassigned *
primary* shard, all others are assigned. Yes, it makes so sense really
but the corresponding assigned shard clearly states "primary: false" when
using the cluster state API.

We did not set any "cluster.routing.allocation." or
"index.routing.allocation.
" settings. The index has 5 shards, I'm not sure
if this is small in this context.

I just checked the actual data folders of the nodes and noticed following:
There's a shard-folder for the missing shard on one node and this shard is
not listed when asking the cluster state API and there are two nodes (both
are master nodes) which have a _state/state-4 file (a binary file). Here's
a table view:

In folder ../nodes/0/indices/indexName/ there are following folders

node1: 0, 1, 2, 3
node2: 0, 2, 3, 4 <-- Folder/Shard 4 is the assigned non-primary shard
node3: _state/state-4 (Master)
node4: _state/state-4 (Master)
node5: 1, 4 (Master) <--- Folder/Shard 4 is not listed when asking the
cluster API

Maybe it's enough to restart node5 to trigger reallocation of Shard 4?

--


(Stefan Pi-3) #7

These are the messages in node5 related to the missing shard:

[2012-10-27 12:58:19,451][WARN ][indices.cluster ] [node5]
[indexName][4] master
[[node5][YTrhZE9KQM6BtrV4Hn_8gg][inet[/node3-IP:9300]]] marked shard as
started, but shard have not been created, mark shard a
s failed
[2012-10-27 12:58:19,451][WARN ][cluster.action.shard ] [node5] sending
failed shard for [indexName][4], node[4P_nMhkrRD2iZOCsm02b1Q], [R],
s[STARTED], reason [master
[node3][YTrhZE9KQM6BtrV4Hn_8gg][inet[/node5-IP:9300]] marked shard as
started, but shard have not been created, mark shard as failed]

But we got these messages for a lot of the indices/shards at this time and
all other shards except the shard 4 were assigned correctly. So I'm not
sure if this really helps.

As we don't really want to restart the whole cluster (last time it took
around 3+ hours until it was in green state again), we decided to delete
the index and re-index. I hope this will fix the issue.

Thanks a lot for your help!

--


(system) #8