Elasticserch 2.3.3 Replicas keep getting marked as failed in a 4 node cluster

S_A_M · October 16, 2016, 6:32pm

Any ideas what the issue may be? any assistance would be appreciated.

We're using Elasticsearch 2.3.3 currently and are seeing lots of these somewhat frequently in the logs - not really sure why.:

[2016-10-16 03:22:27,922][WARN ][indices.cluster ] [data1] [[idx-1955361900][7]] marking and sending shard failed due to [engine failure, reason [indices:admin/flush[s] failed on replica]]
[idx-1955361900][[idx-1955361900][7]] FlushNotAllowedEngineException[already flushing...]

[2016-10-16 03:22:40,792][WARN ][indices.cluster ] [data1] [[idx-1955361900][5]] marking and sending shard failed due to [engine failure, reason [indices:admin/flush[s] failed on replica]]
[idx-1955361900][[idx-1955361900][5]] FlushNotAllowedEngineException[already flushing...]

This is also a fairly new cluster implementation so this issue did not just appear.

Cluster info:
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 4,
"active_primary_shards" : 1104,
"active_shards" : 2046,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Bruce_Ritchie · October 17, 2016, 2:51pm

Looks to be a bug fixed in 2.4.1 - https://github.com/elastic/elasticsearch/pull/20632

S_A_M · October 17, 2016, 3:08pm

Thanks!

Elasticserch 2.3.3 Replicas keep getting marked as failed in a 4 node cluster

[2016-10-16 03:22:27,922][WARN ][indices.cluster ] [data1] [[idx-1955361900][7]] marking and sending shard failed due to [engine failure, reason [indices:admin/flush[s] failed on replica]] [idx-1955361900][[idx-1955361900][7]] FlushNotAllowedEngineException[already flushing...]

[2016-10-16 03:22:27,922][WARN ][indices.cluster ] [data1] [[idx-1955361900][7]] marking and sending shard failed due to [engine failure, reason [indices:admin/flush[s] failed on replica]]
[idx-1955361900][[idx-1955361900][7]] FlushNotAllowedEngineException[already flushing...]