Background thread had an uncaught exception: org.elasticsearch.ElasticsearchException: failed to refresh store stats

zpp · March 16, 2016, 3:32am

Hi,

I have a two nodes elk cluster, with elasticsearch running version 1.7.2.
Somehow a couple of days ago, elasticsearch started giving below exception continuously. When this error happened, indexer can still send data to elasticsearch in the beginning, but after a few days, it started to fail with error 503.

Elasticsearch exception:
[ERROR][marvel.agent ] [clustername] Background thread had an uncaught exception:
org.elasticsearch.ElasticsearchException: failed to refresh store stats
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1573)
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1558)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.index.store.Store.stats(Store.java:290)
at org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:639)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:139)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:55)
at org.elasticsearch.indices.IndicesService.stats(IndicesService.java:231)

logstash indexer error:
:message=>"retrying failed action with response code: 503

The server's hardware, CPU, memory all looks OK.
The problem goes away after I restart the elasticsearch service.

What caused this problem? How to prevent it from happening again? Is there a way to monitor this error or ES can send out notifications?

warkolm · March 18, 2016, 12:06am

503 suggests that your thread pools may be overloaded, what does Marvel show them at?

zpp · March 18, 2016, 1:32am

Hi, thank you for replying.
there are various thread pool statistics in Marvel, I assume you're referring to the one related to indexing. let me know if I'm wrong.
The index thread pool thread count is constantly at 32
index thread pool rejected count at 0
index thread pool ops per sec is usually at 0, but sometimes may go up bit, highest is 0.003
index thread pool queue size is always 0
INDEX THREAD POOL LARGEST THREAD COUNT is at 32

nbijub · June 13, 2016, 9:26am

We are also seeing the same issue in our ES cluster (1.7.5). Following is the stacktrace on some of our data nodes.

[2016-06-13 08:36:53,111][ERROR][marvel.agent ] [es-data-NODE-XX] Background thread had an uncaught exception:
org.elasticsearch.ElasticsearchException: failed to refresh store stats
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1573)
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1558)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.index.store.Store.stats(Store.java:290)
at org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:638)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:139)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:55)
at org.elasticsearch.indices.IndicesService.stats(IndicesService.java:231)
at org.elasticsearch.indices.IndicesService.stats(IndicesService.java:188)
at org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:342)
at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:254)
at java.lang.Thread.run(Unknown Source)

This is quite confusing but we observe that the ES Cluster health API shows that the cluster has all nodes whereas Marvel shows the nodes with the above exception to be missing.

At the same time I see all other nodes throwing the below stacktrace. I checked, there is no communication issue between the nodes.
[2016-06-12 15:00:48,721][DEBUG][action.admin.cluster.node.stats] [es-client-XX] failed to execute on node [v1-ua9fZSym6v-wjVtzucQ]
org.elasticsearch.transport.RemoteTransportException: [es-data-NODE-XX][inet[/192.168.XX.XX:9300]][cluster:monitor/nodes/stats[n]]
Caused by: org.elasticsearch.ElasticsearchException: failed to refresh store stats
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1573)
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1558)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.index.store.Store.stats(Store.java:290)
at org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:638)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:139)
at org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:55)
at org.elasticsearch.indices.IndicesService.stats(IndicesService.java:231)
at org.elasticsearch.node.service.NodeService.stats(NodeService.java:156)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:96)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:44)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:292)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:283)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)

nbijub · June 15, 2016, 8:32am

Any help here ? ?

Biju

Topic		Replies	Views
Elasticsearch cluster fails to stabilize Elasticsearch	5	938	July 6, 2017
Timeouts on Node Stats API? Elasticsearch	10	2789	July 6, 2017
Status is red. _plugins/head return 404 Elasticsearch	4	555	July 3, 2018
Nodes/stats problem Elasticsearch	4	285	July 6, 2017
My ES stuck once a week with no reason? Elasticsearch	2	359	July 6, 2017

Background thread had an uncaught exception: org.elasticsearch.ElasticsearchException: failed to refresh store stats

Related topics