Hello,
We are experiencing for a few days now a scenario which 2 nodes are at 100% CPU usage.
I can tell that these are searches as the search pool and search queue are filled, but I cannot figure out from where they are coming from.
I have tried to remove our main API clients, and also stopped Kibana instance, but it solves the problem only for few minutes and then it returns.
Is there a way from tasks, hot_threads or other APIs to see who is the client performing the requests, or what is the searched index pattern of the requests?
I do see index names for regular searches when I use tasks API but for scrolls I cannot see which index is used.
I have attached tasks output for both problematic nodes in Gist:
We are experiencing a serious cluster degradation,
please assist,
You can identify clients by the X-Opaque-Id header as reported in the search slow log, assuming the clients are setting this header. That header is also reported by the REST request tracer, assuming you're on ≥7.7.
Other than that, I don't think the client identity is exposed by Elasticsearch. You'll need to look at the underlying network traffic.
Thank you for the comments, actually we are not sending this header.
After more digging we finally found that the requests were coming from someone who left a Grafana dashboard with auto-refresh, with a query which uses wildcards on the entire document (without specifying field name).
We will try to implement the header for future use as it could have help us figuring out that the requests are from Grafana, rather then API or Kibana.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.