"Shard ... should exists, but doesn't" errors

Robin_Hughes · December 11, 2012, 11:52am

We have a cluster with approximately 200m documents divided between 19
indexes. These are sharded so there are approximately 1m documents per
shard, with no replicas. The data nodes in the cluster are hosted on 8 x
m2.2xlarge amazon EC2 instances.

One of the nodes appears to have had some kind of networking issue, and
temporarily left the cluster. Once it rejoined, it reports a lot of "marked
shard as started, but shard have not been created, mark shard as failed"
errors before eventually settling down leaving 4 unassigned shards, each
with the following error:

[2012-12-06 17:42:30,070][WARN ][indices.cluster ] [Richard Rider]
[docs-en-1][29] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[docs-en-1][29] shard allocated for local recovery (post api), should
exists, but doesn't

Folders existed on disk for these unassigned shards on the node, but they
had no data in them (du -sh reported size 0)

I don't think split-brain explains this, as we have 3 m1.small dataless
nodes configured to be masters, with discovery.zen.minimum_master_nodes: 2,
and there is a "not enough master nodes after master left" message in the
logs suggesting this setting is working correctly.

I would like to understand the cause of this issue, to prevent it happening
again. Also, if we had replicas, would elasticsearch have recovered
correctly from this situation?

--

Igor_Motov · December 13, 2012, 9:21pm

Not sure why these shards disappeared, but adding replicas
would definitely help in this situations.

Which version of elasticasearch are you using?

On Tuesday, December 11, 2012 6:52:58 AM UTC-5, Robin Hughes wrote:

We have a cluster with approximately 200m documents divided between 19
indexes. These are sharded so there are approximately 1m documents per
shard, with no replicas. The data nodes in the cluster are hosted on 8 x
m2.2xlarge amazon EC2 instances.

One of the nodes appears to have had some kind of networking issue, and
temporarily left the cluster. Once it rejoined, it reports a lot of "marked
shard as started, but shard have not been created, mark shard as failed"
errors before eventually settling down leaving 4 unassigned shards, each
with the following error:

[2012-12-06 17:42:30,070][WARN ][indices.cluster ] [Richard
Rider] [docs-en-1][29] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[docs-en-1][29] shard allocated for local recovery (post api), should
exists, but doesn't

Folders existed on disk for these unassigned shards on the node, but they
had no data in them (du -sh reported size 0)

I don't think split-brain explains this, as we have 3 m1.small dataless
nodes configured to be masters, with discovery.zen.minimum_master_nodes: 2,
and there is a "not enough master nodes after master left" message in the
logs suggesting this setting is working correctly.

I would like to understand the cause of this issue, to prevent it
happening again. Also, if we had replicas, would elasticsearch have
recovered correctly from this situation?

--

Robin_Hughes · December 16, 2012, 12:24pm

Hi Igor

Thanks for the reply. We're using v0.19.8

On Thursday, December 13, 2012 9:21:08 PM UTC, Igor Motov wrote:

Not sure why these shards disappeared, but adding replicas
would definitely help in this situations.

Which version of elasticasearch are you using?

On Tuesday, December 11, 2012 6:52:58 AM UTC-5, Robin Hughes wrote:

We have a cluster with approximately 200m documents divided between 19
indexes. These are sharded so there are approximately 1m documents per
shard, with no replicas. The data nodes in the cluster are hosted on 8 x
m2.2xlarge amazon EC2 instances.

One of the nodes appears to have had some kind of networking issue, and
temporarily left the cluster. Once it rejoined, it reports a lot of "marked
shard as started, but shard have not been created, mark shard as failed"
errors before eventually settling down leaving 4 unassigned shards, each
with the following error:

[2012-12-06 17:42:30,070][WARN ][indices.cluster ] [Richard
Rider] [docs-en-1][29] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[docs-en-1][29] shard allocated for local recovery (post api), should
exists, but doesn't

Folders existed on disk for these unassigned shards on the node, but they
had no data in them (du -sh reported size 0)

I don't think split-brain explains this, as we have 3 m1.small dataless
nodes configured to be masters, with discovery.zen.minimum_master_nodes: 2,
and there is a "not enough master nodes after master left" message in the
logs suggesting this setting is working correctly.

I would like to understand the cause of this issue, to prevent it
happening again. Also, if we had replicas, would elasticsearch have
recovered correctly from this situation?

--

Topic		Replies	Views
Shards Unavailable after some time Elasticsearch	2	747	July 6, 2017
ES Ate My Shards/Indexes Elasticsearch	13	589	July 6, 2017
Cluster crashed Elasticsearch	9	478	July 6, 2017
Disappearing Data and Unassigned Shards Elasticsearch	5	859	July 6, 2017
Unassigned shardsafter nodes restart Elasticsearch	4	403	July 6, 2017

"Shard ... should exists, but doesn't" errors

Related topics