My cluster has gotten into an odd state today. I have a regular job that
deletes indices after X days. The job executed an index deletion this
morning. When it did this the cluster went into a 'red state' claiming that
there were 10 unassigned shards (5 shards + 1 replica). After some
debugging I discovered the shards were associated with this index. I was
able to assign the shards manually to a node which fixed the cluster state.
I was then able to reproduce the issue by re-deleting the index. I captured
a lot of data on this attempt and I may be able to repro again and get
more. Any ideas on why this may have happened and how to prevent it?
DETAILS:
ES version: 1.0.1 (recent upgrade from 0.90.11) Indexes were not carried
over.
NOTE: This is after it had failed qand was restored to a green state. The
deleted index had data before the initial deletion.
BEFORE
All indices status
This reported the index (index-2014.04.03) existed with all shards in the
STARTED state
curl 'http://es:9200/_status?pretty=true'
"_shards" : {
"total" : 70,
"successful" : 70,
"failed" : 0
Specific index status
This reported that the index had 10 successful shards
curl 'http://es:9200/index-2014.04.03/_status?pretty=true'
"_shards" : {
"total" : 10,
"successful" : 10,
"failed" : 0
},
Index Settings
This showed that the index exists with the expected settings
curl 'http://es:9200/_settings?pretty=true'
"index-2014.04.03" : {
"settings" : {
"index" : {
"uuid" : "odEYl4lMQAiXFu4zQfBUeA",
"number_of_replicas" : "1",
"number_of_shards" : "5",
"refresh_interval" : "5s",
"version" : {
"created" : "1000199"
}
}
}
},
Cluster State
Cluster state showed that there were no unassigned nodes.
curl 'http://es:9200/_cluster/state?pretty=true'
...
"routing_nodes" : {
"unassigned" : [ ],
..
Then I perform a delete
$ curl -X DELETE 'http://es:9200/index-2014.04.03'
{"acknowledged":true}
AFTER
All indices status
This no longer reports the deleted index (index-2014.04.03)
curl 'http://es:9200/_status?pretty=true'
"_shards" : {
"total" : 70,
"successful" : 60,
"failed" : 0
},
Specific index status
This was the giveaway that we had an issue. It reports no data for the
index except that it had associated shards and were not successful.
curl 'http://es:9200/index-2014.04.03/_status?pretty=true'
{
"_shards" : {
"total" : 10,
"successful" : 0,
"failed" : 0
},
"indices" : { }
}
Index Settings
This showed that the index exists with the expected settings
curl 'http://es:9200/_settings?pretty=true'
"index-2014.04.03" : {
"settings" : {
"index" : {
"uuid" : "odEYl4lMQAiXFu4zQfBUeA",
"number_of_replicas" : "1",
"number_of_shards" : "5",
"refresh_interval" : "5s",
"version" : {
"created" : "1000199"
}
}
}
},
Cluster State
Cluster state showed the index and that all of its shards were unassigned.
curl 'http://es:9200/_cluster/state?pretty=true'
...
"index-2014.04.03" : {
"shards" : {
"2" : [ {
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "index-2014.04.03"
}, {
...
"routing_nodes" : {
"unassigned" : [ {
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "index-2014.04.03"
}, {
...
To 'correct' the situation I ran:
curl -XPOST 'es:9200/_cluster/reroute' -d '{"commands": [ {"allocate": {
"index": "index-2014.04.03", "shard": 4, "node":
"cvBlpg_jQTajnq-HdJCfCA", "allow_primary": true } }]}'
Let me know if there is more I can assist with.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7359b39d-30d1-4ce9-9187-43e63e25827a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.