CPU Spiking after Recovery/Maintenance Mode

Last week, our cluster "162b24" had an issue where the hardware degraded on the machine and we needed it to be recovered to a snapshot. The conversation is here.

Since then, the machine has worked properly and then randomly spiked to 100% CPU every 3 days. I need to manually restart the cluster and it works again for a few days.

Nothing has changed on our end since except we cleaned out a bunch of .watch_history-* indexes as recommended.

Any help would be appreciated.

Hi @rthomps7,

On the surface, I'd expect to see CPU spikes based on the increased request rates:

Is the spike in CPU use here correlate around what you're seeing?



That certainly seems related. I'm not sure what would be causing it since I don't know what could be spiking on our end.

In the second chart, "Request by Action", what could "other" entail? Maybe that will help isolate where the requests are coming from.

@erikthered - Could you take another look at this? I'm not seeing request spikes on the graphs on my side and the CPU spikes are still happening frequently.

