Hey guys,
First off, we are running ES 2.4.2.
The last few weeks we encountered a nasty behaviour of ES in regards to share recovery.
When i restart a data node and the shard recovery process beginns i see the following "informations" in the debug log:
[2017-03-09 07:33:16,701][DEBUG][index.shard ] [node] [logstash-2017.02.23][11] updateBufferSize: engine is closed; skipping
[2017-03-09 07:33:16,701][DEBUG][index.shard ] [node] [logstash-2017.02.23][3] updateBufferSize: engine is closed; skipping
[2017-03-09 07:33:16,701][DEBUG][index.shard ] [node] [logstash-2017.02.23][4] updateBufferSize: engine is closed; skipping
[2017-03-09 07:33:16,701][DEBUG][index.shard ] [node] [logstash-2017.02.23][7] updateBufferSize: engine is closed; skipping
[2017-03-09 07:33:16,701][DEBUG][index.shard ] [node] [logstash-2017.02.22][4] updateBufferSize: engine is closed; skipping
[2017-03-09 07:33:16,701][DEBUG][index.shard ] [node] [logstash-2017.02.22][17] updateBufferSize: engine is closed; skipping
While this is no error in particular, as long as this messages appear that mentioned shard dont get recovered, cause of this the recovery of 40 shards can take up to 1 hour (regardless of the disk performance etc.)
The node have fairly high cpu load while recovering the indices, could this lead into some sort of "timeout" for shard recovery leading into this message?
Any feedback is appreciated.
Regards