I am running ES 1.0.1 (and have also verified the same problem with 1.1.0).
I have a cluster of 9 nodes - 8 are http/data nodes and 1 is http/master
(this is a dev/test cluster so running with only one master). I create a
new index with 8 shards no replicas, and populate the index. Everything is
running great. Then I do a full cluster restart. *When everything comes
back up, however, all looks perfect EXCEPT that every time (this is very
consistent) I have a single shard that doesn't get assigned... *Yes,
gateway is set to local - I am using a completely stock config file with
the exception of pathing (data, logs, plugins) and cluster name.
I can't find any information on this to determine if it is expected
behavior (i hope not) or how to resolve it. I have systematically been
changing elasticsearch.yaml configs to see if anything helps fix, but
nothing seems to resolve the issue. I should note that when simulating a
production environment with rolling restarts, there is no issue. Still,
this just feels like incorrect behavior...
I am running ES 1.0.1 (and have also verified the same problem with 1.1.0).
I have a cluster of 9 nodes - 8 are http/data nodes and 1 is http/master
(this is a dev/test cluster so running with only one master). I create a
new index with 8 shards no replicas, and populate the index. Everything is
running great. Then I do a full cluster restart. *When everything comes
back up, however, all looks perfect EXCEPT that every time (this is very
consistent) I have a single shard that doesn't get assigned... *Yes,
gateway is set to local - I am using a completely stock config file with
the exception of pathing (data, logs, plugins) and cluster name.
I can't find any information on this to determine if it is expected
behavior (i hope not) or how to resolve it. I have systematically been
changing elasticsearch.yaml configs to see if anything helps fix, but
nothing seems to resolve the issue. I should note that when simulating a
production environment with rolling restarts, there is no issue. Still,
this just feels like incorrect behavior...
It is a primary shard (I don't have any replicas on this particular test
cluster). I am seeing the following log entry that correlates with the
failure, but it doesn't tell me much...
A manual curl call to allocate shard 1 will successfully add it back into
the cluster fully intact and working (no data in this particular index but
have verified with actual data so this isn't an index corruption type
scenario).
On Monday, March 31, 2014 2:03:03 PM UTC-6, Mark Walkom wrote:
Is it a primary, replica? Is it in an initialising or relocating state?
Do the logs show anything?
I am running ES 1.0.1 (and have also verified the same problem with
1.1.0).
I have a cluster of 9 nodes - 8 are http/data nodes and 1 is http/master
(this is a dev/test cluster so running with only one master). I create a
new index with 8 shards no replicas, and populate the index. Everything is
running great. Then I do a full cluster restart. *When everything
comes back up, however, all looks perfect EXCEPT that every time (this is
very consistent) I have a single shard that doesn't get assigned... *Yes,
gateway is set to local - I am using a completely stock config file with
the exception of pathing (data, logs, plugins) and cluster name.
I can't find any information on this to determine if it is expected
behavior (i hope not) or how to resolve it. I have systematically been
changing elasticsearch.yaml configs to see if anything helps fix, but
nothing seems to resolve the issue. I should note that when simulating a
production environment with rolling restarts, there is no issue. Still,
this just feels like incorrect behavior...
A little more digging into the error, it sure looks like elasticsearch is
getting confused/broken when trying to recover. My 'http' (master only)
nodes appear to be getting included in attempts to recover, resulting in
the error posted above...
Perhaps I am jumping to conclusions here, but if so, sure smells like a bug
to me - a master only node should not even be considered for recovery
efforts.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.