I am getting a BroadcastShardOperationFailedException when I do a GET against my cluster of two:
{
* count: 0
*
-
_shards: {
o total: 5
o successful: 3
o failed: 2
o
-
failures: [
+
-
{
# index: "index0"
# shard: 0
# reason: "BroadcastShardOperationFailedException[[index0][0] No active shard(s)]"
}
+
-
{
# index: "index0"
# shard: 2
# reason: "BroadcastShardOperationFailedException[[index0][2] No active shard(s)]"
}
]
}
}
There should be 5 shards; I've restarted a number of times, and sometimes I get all 5 shards failing in essentially the same JSON as below (except it lists all shards).
In one node log I see:
[20:10:59,177][INFO ][cluster.metadata ] [Keith Kilham] [index0] creating index, cause [gateway], shards [5]/[1], mappings [mail, calendarItem, attachment, contact]
[20:11:00,005][INFO ][cluster.service ] [Keith Kilham] added {[Tinkerer][e101b4b3-daa2-45fc-b042-ef14a5a56bea][inet[/10.244.255.242:9300]]{client=true, data=false, zen.master=false},}, reason: zen-disco-receive(from node[[Tinkerer][e101b4b3-daa2-45fc-b042-ef14a5a56bea][inet[/10.244.255.242:9300]]{client=true, data=false, zen.master=false}])
[20:11:00,039][INFO ][cluster.service ] [Keith Kilham] added {[Crystal][b6b67cac-650d-46e6-b949-a6497f2d8f81][inet[/10.244.255.242:9301]]{client=true, data=false, zen.master=false},}, reason: zen-disco-receive(from node[[Crystal][b6b67cac-650d-46e6-b949-a6497f2d8f81][inet[/10.244.255.242:9301]]{client=true, data=false, zen.master=false}])
[20:11:27,547][INFO ][cluster.service ] [Keith Kilham] added {[Madelyne Pryor][c3e71382-a3c6-4c7e-bc9c-283fa9fff23a][inet[/10.198.109.171:9300]],}, reason: zen-disco-receive(from node[[Madelyne Pryor][c3e71382-a3c6-4c7e-bc9c-283fa9fff23a][inet[/10.198.109.171:9300]]])
[20:11:27,588][INFO ][http ] [Keith Kilham] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.206.98.255:9200]}
[20:11:27,732][INFO ][jmx ] [Keith Kilham] bound_address {service:jmx:rmi:///jndi/rmi://:9400/jmxrmi}, publish_address {service:jmx:rmi:///jndi/rmi://10.206.98.255:9400/jmxrmi}
[20:11:27,733][INFO ][node ] [Keith Kilham] {elasticsearch/0.9.0}[11486]: started
[20:11:31,139][WARN ][index.gateway.s3 ] [Keith Kilham] [index0][0] no file [_6vm.cfs] to recover, even though it has md5, ignoring it
[20:11:31,140][WARN ][index.gateway.s3 ] [Keith Kilham] [index0][0] no file [_9bz.cfs] to recover, even though it has md5, ignoring it
In the other node log I see scrolling by over an over constantly:
[20:17:26,004][DEBUG][index.shard.service ] [Madelyne Pryor] [index0][1] state: [RECOVERING]->[CREATED], restored after recovery
[20:17:26,047][DEBUG][index.shard.service ] [Madelyne Pryor] [index0][4] state: [CREATED]->[RECOVERING]
[20:17:26,048][DEBUG][index.shard.service ] [Madelyne Pryor] [index0][4] state: [RECOVERING]->[CREATED], restored after recovery
[20:17:26,061][DEBUG][index.shard.service ] [Madelyne Pryor] [index0][3] state: [CREATED]->[RECOVERING]
[20:17:26,062][DEBUG][index.shard.service ] [Madelyne Pryor] [index0][3] state: [RECOVERING]->[CREATED], restored after recovery
[20:17:26,106][DEBUG][index.shard.service ] [Madelyne Pryor] [index0][1] state: [CREATED]->[RECOVERING]
[20:17:26,107][DEBUG][index.shard.service ] [Madelyne Pryor] [index0][1] state: [RECOVERING]->[CREATED], restored after recovery
When I write above of a 'cluster of two' I mean there are two data-carrying nodes. The third node, which is a no-data client only using the Java API, gets a PrimaryNotStartedActionException.
I am on Elastic Search 0.9. It all used to work and started failing today. Yesterday I did call the REST optimize command and I saw these problems today after a restart (the first restart since the optimize). This is deployed in a shared environment and I can't upgrade to the latest Elastic Search right away, although I plan to when I can.
Thanks for any help you can provide.