High CPU usage while bulk indexing

Are you (or an app/script) calling stats endpoints very frequently? The various stats endpoints (index stats, node stats, etc) are fairly heavy API calls since they have to talk to all the nodes, collect OS and ES stats and the compile the results for the user.

Looking at the timestamps on those monitor/nodes/stats timeouts, they are showing up every second or so, making me think there is a process polling stats at least once per second.

I've seen clusters brought to their knees by errant scripts calling stats too frequently (several times per second, once per second, etc). It's made worse by clusters with more nodes (more machines to talk to) or more indices (more shards to compile stats for).

Can you verify there isn't a script or service that's repeatedly hitting the stats? I'm not sure what frequency Datadog polls at, but that may be related.

1 Like