Concurrent Indices stats requests cause cluster to go red

johnskopis · December 18, 2018, 5:22pm

We use the elasticsearch-prometheus-exporter plugin. Occasionally it seems the cluster status turns red from making index stats requests via prometheus-exporter plugin.

has anyone else experienced this issue?

I obtained a threaddump when the cluster was red and saw this:

gist.github.com

https://gist.github.com/johnskopis/bb3012a42f2b56304cc31d2861b6bfe0

completion stats

"elasticsearch[36s384][management][T#1]" #163 daemon prio=5 os_prio=0 cpu=55221908.38ms elapsed=162081.11s tid=0x00007eebd8005800 nid=0x11d runnable  [0x00007eeb5c8f3000]
   java.lang.Thread.State: RUNNABLE
	at java.util.Collections$UnmodifiableCollection$1.hasNext(java.base@11.0.1/Collections.java:1044)
	at org.elasticsearch.index.engine.Engine.completionStats(Engine.java:212)
	at org.elasticsearch.index.shard.IndexShard.completionStats(IndexShard.java:1007)
	at org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:210)
	at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:178)
	at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:48)
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:430)
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:409)

This file has been truncated. show original

seems like the issue is the management threadpool is doing too much work. Does this sound right?

johnskopis · December 18, 2018, 5:22pm

I should add I tried to open a bug here: https://github.com/elastic/elasticsearch/issues/36773

DavidTurner · December 18, 2018, 6:06pm

How often are you requesting indices stats? The completion stats in particular look nontrivial to calculate, and are not even mentioned in the list of stats in your plugin's README so maybe you can simply avoid asking for them. The indices stats API supports requesting subsets of stats.

The plugin's readme also says this:

NOTE: The exporter fetches information from an Elasticsearch cluster on every scrape, therefore having a too short scrape interval can impose load on ES master nodes, particularly if you run with -es.all and -es.indices . We suggest you measure how long fetching /_nodes/stats and /_all/_stats takes for your ES cluster to determine whether your scraping interval is too short. As a last resort, you can scrape this exporter using a dedicated job with its own scraping interval.

This seems like wise advice.

system · January 15, 2019, 6:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster turns to red after reboot Elasticsearch	29	2768	January 4, 2019
ES cluster goes into red frequently Elasticsearch	21	2543	November 29, 2019
Cluster state red, requests timed out, no error in log? Elasticsearch	2	364	July 6, 2017
Mysterious "red" cluster status has happened ~4x now Elasticsearch	1	301	July 6, 2017
Elasticsearch Cluster Status red and Incides status red without error and reason Elasticsearch	4	712	July 20, 2021

Concurrent Indices stats requests cause cluster to go red

Related topics