"failed to execute on node" Exception on elasticsearch 2.3.2

hi I have below exception consistently for all my three nodes' elasticsearch log file. The health status is green, can anyone help?? I think it causes the elasticsearch down after sometime as I just experienced one today (5/10) since last friday (5/6)'s cluster upgrade.

[2016-05-10 16:12:39,369][DEBUG][action.admin.cluster.node.stats] [fslelkprod01] failed to execute on node [QSXAvrCzQQGDoprePsPzTQ]
RemoteTransportException[[fslelkprod01][fslelkprod01/10.193.91.25:9300][cluster:monitor/nodes/stats[n]]]; nested: AlreadyClosedException[this IndexReader is closed];
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:274)
at org.apache.lucene.index.CompositeReader.getContext(CompositeReader.java:101)
at org.apache.lucene.index.CompositeReader.getContext(CompositeReader.java:55)
at org.apache.lucene.index.IndexReader.leaves(IndexReader.java:438)
at org.elasticsearch.search.suggest.completion.Completion090PostingsFormat.completionStats(Completion090PostingsFormat.java:330)
at org.elasticsearch.index.shard.IndexShard.completionStats(IndexShard.java:765)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:164)
at org.elasticsearch.indices.IndicesService.stats(IndicesService.java:253)
at org.elasticsearch.node.service.NodeService.stats(NodeService.java:158)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:82)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:44)
at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:92)
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:230)
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:226)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-05-10 16:12:39,371][ERROR][marvel.agent.collector.node] [fslelkprod01] collector [node-stats-collector] - failed collecting data
java.lang.ArrayIndexOutOfBoundsException: 0
at org.elasticsearch.action.support.nodes.BaseNodesResponse.getAt(BaseNodesResponse.java:72)
at org.elasticsearch.marvel.agent.collector.node.NodeStatsCollector.doCollect(NodeStatsCollector.java:88)
at org.elasticsearch.marvel.agent.collector.AbstractCollector.collect(AbstractCollector.java:99)
at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:187)
at java.lang.Thread.run(Thread.java:745)

Unfortunately, this was a bug introduced in 2.3.0. It was just fixed and will be released with 2.3.2: https://github.com/elastic/elasticsearch/pull/18094

hi Zachary, thanks for the information. so how can we work around this before the 2.3.3 with bug fix is released?

Is your JVM crashing? Does it throw this exception right before crash?

Unless you fall into the edge case with mmapfs crashing, this is basically a harmless exception. You'll see it spammed in your log a lot, but it's otherwise not a problem (the completion stats will be computed incorrectly, that's all).

i tried to search hs_err_pidXXXX.log, didnt see it. does it mean JVM not crash?

Looks like I am hitting the same thing with 2.3.1. My indexing rates are really low compared to normal. My cluster log file only has a few entries... It doesn't appear to be a harmless event to our cluster. Is there anything else I can check?

I would start a new thread as this one is a few months old and seems to have lived its useful life.

That being said have you set the swappiness and vm.max_max_count. This made one of our clusters much happier on resources and stopped the swapping and helped i/o and index much faster.

Hopefully you are on Linux? Another reason to start new thread you can give info about your setup.

run sysctl vm.max_map_count it should return 262144
and cat /proc/sys/vm/swappiness should return 1 or 0

sysctl -w vm.max_map_count=262144