Two specific nodes with 100% CPU

Hello,
We are experiencing for a few days now a scenario which 2 nodes are at 100% CPU usage.
I can tell that these are searches as the search pool and search queue are filled, but I cannot figure out from where they are coming from.
I have tried to remove our main API clients, and also stopped Kibana instance, but it solves the problem only for few minutes and then it returns.

CPU Usage 5 days view from Grafana:

CPU Usage 2 days view from Grafana:

Search pool and queue 2 days from Grafana:

Is there a way from tasks, hot_threads or other APIs to see who is the client performing the requests, or what is the searched index pattern of the requests?
I do see index names for regular searches when I use tasks API but for scrolls I cannot see which index is used.
I have attached tasks output for both problematic nodes in Gist:

We are experiencing a serious cluster degradation,
please assist,

Thanks,
Lior

You can identify clients by the X-Opaque-Id header as reported in the search slow log, assuming the clients are setting this header. That header is also reported by the REST request tracer, assuming you're on ≥7.7.

Other than that, I don't think the client identity is exposed by Elasticsearch. You'll need to look at the underlying network traffic.

1 Like

You could also put Packetbeat in front of the HTTP port and track it that way.

Hey @DavidTurner, @warkolm,

Thank you for the comments, actually we are not sending this header.
After more digging we finally found that the requests were coming from someone who left a Grafana dashboard with auto-refresh, with a query which uses wildcards on the entire document (without specifying field name).
We will try to implement the header for future use as it could have help us figuring out that the requests are from Grafana, rather then API or Kibana.

Thanks,
Lior

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.