Index deletion results in associated shards being orphaned and unassigned


(Dallas Mahrt) #1

My cluster has gotten into an odd state today. I have a regular job that
deletes indices after X days. The job executed an index deletion this
morning. When it did this the cluster went into a 'red state' claiming that
there were 10 unassigned shards (5 shards + 1 replica). After some
debugging I discovered the shards were associated with this index. I was
able to assign the shards manually to a node which fixed the cluster state.
I was then able to reproduce the issue by re-deleting the index. I captured
a lot of data on this attempt and I may be able to repro again and get
more. Any ideas on why this may have happened and how to prevent it?

DETAILS:
ES version: 1.0.1 (recent upgrade from 0.90.11) Indexes were not carried
over.

NOTE: This is after it had failed qand was restored to a green state. The
deleted index had data before the initial deletion.

BEFORE
All indices status
This reported the index (index-2014.04.03) existed with all shards in the
STARTED state

curl 'http://es:9200/_status?pretty=true'
"_shards" : {
"total" : 70,
"successful" : 70,
"failed" : 0

Specific index status
This reported that the index had 10 successful shards

curl 'http://es:9200/index-2014.04.03/_status?pretty=true'

"_shards" : {
"total" : 10,
"successful" : 10,
"failed" : 0
},

Index Settings
This showed that the index exists with the expected settings

curl 'http://es:9200/_settings?pretty=true'

"index-2014.04.03" : {
"settings" : {
"index" : {
"uuid" : "odEYl4lMQAiXFu4zQfBUeA",
"number_of_replicas" : "1",
"number_of_shards" : "5",
"refresh_interval" : "5s",
"version" : {
"created" : "1000199"
}
}
}
},

Cluster State
Cluster state showed that there were no unassigned nodes.

curl 'http://es:9200/_cluster/state?pretty=true'
...
"routing_nodes" : {
"unassigned" : [ ],
..

Then I perform a delete
$ curl -X DELETE 'http://es:9200/index-2014.04.03'
{"acknowledged":true}

AFTER
All indices status
This no longer reports the deleted index (index-2014.04.03)

curl 'http://es:9200/_status?pretty=true'
"_shards" : {
"total" : 70,
"successful" : 60,
"failed" : 0
},

Specific index status
This was the giveaway that we had an issue. It reports no data for the
index except that it had associated shards and were not successful.

curl 'http://es:9200/index-2014.04.03/_status?pretty=true'
{
"_shards" : {
"total" : 10,
"successful" : 0,
"failed" : 0
},
"indices" : { }
}

Index Settings
This showed that the index exists with the expected settings

curl 'http://es:9200/_settings?pretty=true'
"index-2014.04.03" : {
"settings" : {
"index" : {
"uuid" : "odEYl4lMQAiXFu4zQfBUeA",
"number_of_replicas" : "1",
"number_of_shards" : "5",
"refresh_interval" : "5s",
"version" : {
"created" : "1000199"
}
}
}
},

Cluster State
Cluster state showed the index and that all of its shards were unassigned.

curl 'http://es:9200/_cluster/state?pretty=true'
...
"index-2014.04.03" : {
"shards" : {
"2" : [ {
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "index-2014.04.03"
}, {
...
"routing_nodes" : {
"unassigned" : [ {
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "index-2014.04.03"
}, {
...

To 'correct' the situation I ran:
curl -XPOST 'es:9200/_cluster/reroute' -d '{"commands": [ {"allocate": {
"index": "index-2014.04.03", "shard": 4, "node":
"cvBlpg_jQTajnq-HdJCfCA", "allow_primary": true } }]}'

Let me know if there is more I can assist with.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7359b39d-30d1-4ce9-9187-43e63e25827a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Dallas Mahrt) #2

I discovered the root cause. The current master was on a VM that was in a
bad state. Sadly I could not get on to the host to debug the issue but it
was still listening on 9200 and not accessible via ssh. I forced a master
change by shutting down the node using the cluster admin api. Once the
master had switched, index deletion worked.

Hopefully I will be able to get some logs off of that box so I can learn
what state it was in that it staill thought it was the master yet couldn't
perform master duties. This was a dedicated master node with master: true
and data:false.

On Wednesday, April 9, 2014 4:09:54 PM UTC-7, Dallas Mahrt wrote:

My cluster has gotten into an odd state today. I have a regular job that
deletes indices after X days. The job executed an index deletion this
morning. When it did this the cluster went into a 'red state' claiming that
there were 10 unassigned shards (5 shards + 1 replica). After some
debugging I discovered the shards were associated with this index. I was
able to assign the shards manually to a node which fixed the cluster state.
I was then able to reproduce the issue by re-deleting the index. I captured
a lot of data on this attempt and I may be able to repro again and get
more. Any ideas on why this may have happened and how to prevent it?

DETAILS:
ES version: 1.0.1 (recent upgrade from 0.90.11) Indexes were not carried
over.

NOTE: This is after it had failed qand was restored to a green state. The
deleted index had data before the initial deletion.

BEFORE
All indices status
This reported the index (index-2014.04.03) existed with all shards in the
STARTED state

curl 'http://es:9200/_status?pretty=true'
"_shards" : {
"total" : 70,
"successful" : 70,
"failed" : 0

Specific index status
This reported that the index had 10 successful shards

curl 'http://es:9200/index-2014.04.03/_status?pretty=true'

"_shards" : {
"total" : 10,
"successful" : 10,
"failed" : 0
},

Index Settings
This showed that the index exists with the expected settings

curl 'http://es:9200/_settings?pretty=true'

"index-2014.04.03" : {
"settings" : {
"index" : {
"uuid" : "odEYl4lMQAiXFu4zQfBUeA",
"number_of_replicas" : "1",
"number_of_shards" : "5",
"refresh_interval" : "5s",
"version" : {
"created" : "1000199"
}
}
}
},

Cluster State
Cluster state showed that there were no unassigned nodes.

curl 'http://es:9200/_cluster/state?pretty=true'
...
"routing_nodes" : {
"unassigned" : [ ],
..

Then I perform a delete
$ curl -X DELETE 'http://es:9200/index-2014.04.03'
{"acknowledged":true}

AFTER
All indices status
This no longer reports the deleted index (index-2014.04.03)

curl 'http://es:9200/_status?pretty=true'
"_shards" : {
"total" : 70,
"successful" : 60,
"failed" : 0
},

Specific index status
This was the giveaway that we had an issue. It reports no data for the
index except that it had associated shards and were not successful.

curl 'http://es:9200/index-2014.04.03/_status?pretty=true'
{
"_shards" : {
"total" : 10,
"successful" : 0,
"failed" : 0
},
"indices" : { }
}

Index Settings
This showed that the index exists with the expected settings

curl 'http://es:9200/_settings?pretty=true'
"index-2014.04.03" : {
"settings" : {
"index" : {
"uuid" : "odEYl4lMQAiXFu4zQfBUeA",
"number_of_replicas" : "1",
"number_of_shards" : "5",
"refresh_interval" : "5s",
"version" : {
"created" : "1000199"
}
}
}
},

Cluster State
Cluster state showed the index and that all of its shards were unassigned.

curl 'http://es:9200/_cluster/state?pretty=true'
...
"index-2014.04.03" : {
"shards" : {
"2" : [ {
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "index-2014.04.03"
}, {
...
"routing_nodes" : {
"unassigned" : [ {
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "index-2014.04.03"
}, {
...

To 'correct' the situation I ran:
curl -XPOST 'es:9200/_cluster/reroute' -d '{"commands": [ {"allocate":
{ "index": "index-2014.04.03", "shard": 4, "node":
"cvBlpg_jQTajnq-HdJCfCA", "allow_primary": true } }]}'

Let me know if there is more I can assist with.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cddf2a1e-8c1f-484d-88f9-18bb022e7f0a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3