Two specific nodes with 100% CPU

Lior_Yakobov · December 14, 2020, 2:51pm

Hello,
We are experiencing for a few days now a scenario which 2 nodes are at 100% CPU usage.
I can tell that these are searches as the search pool and search queue are filled, but I cannot figure out from where they are coming from.
I have tried to remove our main API clients, and also stopped Kibana instance, but it solves the problem only for few minutes and then it returns.

CPU Usage 5 days view from Grafana:

CPU Usage 2 days view from Grafana:

Search pool and queue 2 days from Grafana:

Is there a way from tasks, hot_threads or other APIs to see who is the client performing the requests, or what is the searched index pattern of the requests?
I do see index names for regular searches when I use tasks API but for scrolls I cannot see which index is used.
I have attached tasks output for both problematic nodes in Gist:

gist.github.com

https://gist.github.com/YakobovLior/ae50c729126072e5cc8fb5aa9a2a5561

tasks_elkdb10.json

{
	"nodes": {
		"qSMd2bKnQkisAGgk4FI10A": {
			"name": "aws-elkdb10",
			"transport_address": "10.128.115.123:9300",
			"host": "10.128.115.123",
			"ip": "10.128.115.123:9300",
			"roles": [
				"data",
				"remote_cluster_client",

This file has been truncated. show original

tasks_elkdb7.json

{
	"nodes": {
		"Q_m3CYyMQwKOQZ9LrstRpg": {
			"name": "aws-elkdb7",
			"transport_address": "10.128.115.193:9300",
			"host": "10.128.115.193",
			"ip": "10.128.115.193:9300",
			"roles": [
				"data",
				"remote_cluster_client",

This file has been truncated. show original

We are experiencing a serious cluster degradation,
please assist,

Thanks,
Lior

DavidTurner · December 14, 2020, 5:28pm

You can identify clients by the X-Opaque-Id header as reported in the search slow log, assuming the clients are setting this header. That header is also reported by the REST request tracer, assuming you're on ≥7.7.

Other than that, I don't think the client identity is exposed by Elasticsearch. You'll need to look at the underlying network traffic.

warkolm · December 14, 2020, 9:05pm

You could also put Packetbeat in front of the HTTP port and track it that way.

Lior_Yakobov · December 15, 2020, 11:49am

Hey @DavidTurner, @warkolm,

Thank you for the comments, actually we are not sending this header.
After more digging we finally found that the requests were coming from someone who left a Grafana dashboard with auto-refresh, with a query which uses wildcards on the entire document (without specifying field name).
We will try to implement the header for future use as it could have help us figuring out that the requests are from Grafana, rather then API or Kibana.

Thanks,
Lior

system · January 12, 2021, 11:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch process using ~100% of CPU Elasticsearch	7	2838	July 5, 2017
High CPU usage on only 1 Data node Elasticsearch	7	920	October 16, 2020
100% peak of processing in all nodes Elasticsearch	7	1067	April 11, 2017
Elasticsearch huge CPU utilisation Elasticsearch	7	856	November 12, 2021
Elasticsearch high cpu usage Elasticsearch	3	286	February 15, 2023

Two specific nodes with 100% CPU

Related topics