Terminate long running queries to protect the system from DoS

Is it possible to somehow terminate long running queries in order to protect the system?

We have the challenge, that 30+ users use our cluster. If one user sends a request for a 30d search, the system becomes unavailable until the query completes.

1 Like

There's currently no way to do this other than restarting the ES node(s).

Is there a feature on the roadmap to address this issue?

From my point of view this is a very basic mechanism (e.g. complex aggs).
A user is able to kill a cluster with just a single query and the sysadmins can not do anything but reboot :frowning:

There is a management type API that is being worked on to deal with this, I don't have an ETA though.

Add a timeout setting to your queries.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#_parameters_5

Adding a timeout will only return either when the search is finished or the timeout hit.

The queries will bother your system even if you return after the timeout.

the timeout checks are weaved into various expensive loops including the aggregations document collector loop

Does this mean, setting the timeout will not terminate the search, but will stop expensive operations?

And ... how does the timeout affect the scrolling API?

Aggregations are conducted by tapping off results from the search's collection stream. A timeout will terminate this operation on the shard in question returning any interim results it may have gathered so far along with a flag to indicate the timeout status. This is reported in the final results.

Hi Mark, Any update on the management API to terminate long running queries?

6 Likes