Bulk indexing causes management threadpool queue to skyrocket

Good catch, yes, that'd explain it. 5 cluster-wide stats calls per second is definitely on the abusive side. When you weren't indexing it looks like you had enough resources (particularly, IO bandwidth) to cope with the monitoring load, but it seems that the extra load from indexing pushed it over the edge.

I think we can generally do better here, optimisations of the completion stats calculation aside, so I opened https://github.com/elastic/elasticsearch/issues/51992 to discuss higher-level protection against this.

2 Likes