RemoteTransportException - AlreadyClosedException[this IndexReader is closed]

brettinman · May 9, 2016, 6:30pm

This looks like an error on the internal transport but I'm not sure why it's happening. The cluster is green and otherwise happy, ingesting 6-700GB a day across 20 data nodes (20 shards per index). We are having some issues with this 2.3.1 cluster not indexing all documents versus an old 1.7.2 cluster (both being shipped to from the same Heka instances) but I'm not sure that is related to this exception.

No indexes are closed so I assume this is some kind of timeout in the internal communication? Is this a problem?

[2016-05-09 18:20:15,887][DEBUG][action.admin.cluster.node.stats] [ip-10-10-10-10] failed to execute on node [huiodfhg78ytdfghg89]
RemoteTransportException[[ip-10-10-10-10][10.10.10.10:9300][cluster:monitor/nodes/stats[n]]]; nested: AlreadyClosedException[this IndexReader is closed];
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:274)
at org.apache.lucene.index.CompositeReader.getContext(CompositeReader.java:101)
at org.apache.lucene.index.CompositeReader.getContext(CompositeReader.java:55)
at org.apache.lucene.index.IndexReader.leaves(IndexReader.java:438)
at org.elasticsearch.search.suggest.completion.Completion090PostingsFormat.completionStats(Completion090PostingsFormat.java:330)
at org.elasticsearch.index.shard.IndexShard.completionStats(IndexShard.java:765)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:164)
at org.elasticsearch.indices.IndicesService.stats(IndicesService.java:253)
at org.elasticsearch.node.service.NodeService.stats(NodeService.java:158)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:82)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:44)
at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:92)
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:230)
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:226)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

polyfractal · May 10, 2016, 3:29pm

Yep, unfortunately this was a bug introduced in 2.3.0. It was just fixed in 2.3.2 (unreleased as of writing) so should be available soonish: https://github.com/elastic/elasticsearch/pull/18094

It's unclear to me if this is actually related to the problems you're having, it may just be coincidental. Can you describe your problem a bit more?

Also, are you checking that A) there are no bulk rejections and B) if there are rejections, you're retrying the rejected documents? A rejection isn't really an error, it's just backpressure and the cluster saying "please try again later". So if you aren't retrying the rejected docs, they will be silently dropped on the floor by your app (or Heka or whatever) and never get indexed.

brettinman · May 10, 2016, 6:03pm

Thank's for the link to the issue, didn't show up when I was searching. That would explain the exception - I take it it's harmless(-ish) unless you're running mmapfs, so we're good there.

As for the ingestion mismatch, I can't blame it on this exception, just wanted to understand this one since it was the only thing that jumped out in the logs. It definitely looks like some sort of backpressure since the rate drops off a bit and then shoots higher to catch up. Thanks!

polyfractal · May 10, 2016, 6:05pm

Cool, happy to help! And yeah, that exception should be harmless unless you fall into the mmap edge case (in which case it's quite gruesome)

memelet · June 15, 2016, 2:27pm

I'm running 2.3.2 and are seeing lots of these errors.

jasontedor · June 15, 2016, 3:13pm

It was not fixed in 2.3.2, but in 2.3.3.

Topic		Replies	Views
Failed to execute on node Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed Elasticsearch	2	1212	February 7, 2018
org.apache.lucene.store.AlreadyClosedException Elasticsearch	2	1627	July 5, 2017
org.apache.lucene.store.AlreadyClosedException - Can this be ignored? Elasticsearch	1	1018	February 21, 2018
Error on Master Nodes Elasticsearch	3	1807	July 5, 2017
"failed to execute on node" Exception on elasticsearch 2.3.2 Elasticsearch	7	5802	July 5, 2017

RemoteTransportException - AlreadyClosedException[this IndexReader is closed]

Related topics