Dec 9th, 2024: [EN]: Leveraging AutoOps to detect long-running search queries

Cet article est aussi disponible en français.

Released early November on Elastic Cloud Hosted, AutoOps significantly simplifies cluster management with performance recommendations, resource utilization and cost insights, real-time issue detection and resolution paths.

Among the hundreds of different analysis that AutoOps runs every minute to check your settings, metrics and health, one of them allows you to be alerted when long running search queries are plaguing your cluster. Let's see how it works concretely.

The beauty of AutoOps for Elastic Cloud Hosted is that there's nothing for you to do. When you spin up a new deployment in a supported region, an AutoOps agent will be automatically attached to it, and within minutes, metrics will start shipping, analysis will kick in, and events will be raised as soon as something fishy is detected.

There's no need to enable slow logs and set up Filebeat to tail and index them somewhere, it just works out of the box by carefully and regularly monitoring the Task Management API.

When opening the Deployment view in AutoOps, you're immediately presented with a quick history of the recent events. In the screenshot below, we can see that a "Long running search task" event was opened recently.

Clicking on the event opens up a fly out panel showing you the DSL of the slow search query that has been detected along with a whole bunch of information related to the execution context of that query. Let's review all the information we got in the screenshot below.

  1. First, we get a link to the node where the long-running query was detected, i.e. instance-0000000223. That link allows you to jump directly to the Nodes view where you'll find a wealth of metrics and information about that specific node.

  2. Then, you'll also see which indices the query was being run on, i.e. logs-apache.error-default, logs-nginx.error-default and two more indices. Clicking on those indices will send you to the Shards view which will allow you to see the detailed shards breakdown of those indices on the identified node as well as all the shards of other indices also located on that node. That view will help you detect if there are any hotspots that might be responsible for causing the slow query.

  3. Digging deeper, we can then see that some query analysis took place and AutoOps surfaced a few potential reasons why the query might be slow. In this case, we can see that:

    • the query ran on a 30 days time interval, which might represent a big volume of data
    • there are nested aggregations, which are known to perform poorly
    • the response might potentially contain up to 20'000 aggregation buckets, which might be taxing on node memory

    There are more detection rules for queries that use regular expression or scripts and new rules are added regularly.

  4. Finally, there's some more information to glean about the context of the search query, such as

    • for how long it has been running,
    • whether it is cancellable or not,
    • all the headers that were attached to the HTTP call. In this case, we can see the trace.id (which makes it easy to find it in APM), but also X-Opaque-Id that contains an indication of the client that sent this query. Here, we can see that the query originated from a SIEM alerting rule in Kibana, but it could also be a visualization or a dashboard, or even a user running something in Dev Tools.

But wait, there's more!! AutoOps doesn't only detect long-running DSL queries, but also ES|QL ones, as can be seen on the screenshot below.

As you can see, AutoOps can help you detect long-running search queries and dig out a wealth of information about them. And all that for free for Elastic Cloud users! With Elastic, life is fantastic!

3 Likes

It looks amazing. Thanks for sharing!
Sometimes even a single expensive query can have a huge impact on performance and stability. It's great to be able to automatically detect long-running searches tasks without activating slowlogs thanks to AutoOps!

Note: You can use the explanation in the link below to add headers to a query. In that way, you can find the source of the query.

1 Like