I’ve gotten some periodic CPU spikes on servers when there are no significant load. Restarting the servers doesn’t help. I have 4 nodes with 3 set to master-capable and one data node.

When it spikes, all 4 servers run at max 100%

I’ve collected node stats for these here:

I know we have some threads data in the gist, I've had a quick look, I can see "peak_count": is above the thread count, could you share:


I will fetch those for you when I can, but I would like to add that I just saw this on the dashboard that I made for myself, when a spike occured


In the past I assumed that it had to do with increased number of indexing / fetching but it appears that it had to do with “refreshing”. Can you explain to me what that means?

This is an issue as the cluster serves a live site and when that occurs, the entire website will become unresponsive and then time out.

