AlreadyClosedException: engine is closed during cluster stats

katre · January 7, 2019, 3:55pm

We monitor elasticsearch by using -cluster/stats. However it seems to give false alerts or we have some underlying issue to be solved.

Cluster contains of 3 nodes. We use curator to periodically delete old indexes and roll indexes. Elasticsearch seems to work fine and curator logs have no errors.
On rare occasion during cluster stats there is an exception:

[2018-12-15T15:30:03,719][INFO ][o.e.c.r.a.AllocationService] [gdph-es-01] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards
started [[gh-partition-000592][1]] ...]).
[2018-12-15T15:40:03,403][INFO ][o.e.c.m.MetaDataDeleteIndexService] [gdph-es-01] [gh-partition-000538/mY4H42tjQsS72_WEomcpzw] deleting index
[2018-12-15T16:30:04,647][INFO ][o.e.c.m.MetaDataDeleteIndexService] [gdph-es-01] [gh-partition-000539/Q5jRkFC5RDWt76DMzdFK_g] deleting index

[2018-12-15T16:30:04,790][DEBUG][o.e.a.a.c.s.TransportClusterStatsAction] [gdph-es-01] failed to execute on node [mGitnIAFQuilix-OjpWU8g]
org.elasticsearch.transport.RemoteTransportException: [gdph-es-03][10.201.236.37:9201][cluster:monitor/stats[n]]
Caused by: org.apache.lucene.store.AlreadyClosedException: engine is closed
at org.elasticsearch.index.shard.IndexShard.getEngine(IndexShard.java:1988) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.index.shard.IndexShard.segmentStats(IndexShard.java:961) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:211) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:106) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:50) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:260) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:256) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:259) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:317) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1592) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.2.jar:6.3.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

Since this happens during curator action can it possibly be a race condition and if so any ideas how to avoid it? Client service itself seems to be unaffected,error rate and response times did not change.

DavidTurner · January 7, 2019, 4:24pm

This is a DEBUG-level log message, so does not indicate any particular problem that needs resolving. You are right that it indicates that an index was being deleted while its stats were being requested.

katre · January 7, 2019, 4:29pm

Thank you, it means we need to rethink our monitoring then.

DavidTurner · January 7, 2019, 4:35pm

It looks like we log DEBUG-level logs for org.elasticsearch.action by default, because of this section in the log4j2.properties file in the config directory:

# log action execution errors for easier debugging
logger.action.name = org.elasticsearch.action
logger.action.level = debug

If you change debug to info then you will not see these messages again.

system · February 4, 2019, 4:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.