Terminate long running queries to protect the system from DoS


#1

Is it possible to somehow terminate long running queries in order to protect the system?

We have the challenge, that 30+ users use our cluster. If one user sends a request for a 30d search, the system becomes unavailable until the query completes.


(Mark Walkom) #2

There's currently no way to do this other than restarting the ES node(s).


#3

Is there a feature on the roadmap to address this issue?

From my point of view this is a very basic mechanism (e.g. complex aggs).
A user is able to kill a cluster with just a single query and the sysadmins can not do anything but reboot frowning


(Mark Walkom) #4

There is a management type API that is being worked on to deal with this, I don't have an ETA though.


(Mark Harwood) #6

Add a timeout setting to your queries.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#_parameters_5


#7

Adding a timeout will only return either when the search is finished or the timeout hit.

The queries will bother your system even if you return after the timeout.


(Mark Harwood) #8

the timeout checks are weaved into various expensive loops including the aggregations document collector loop


#9

Does this mean, setting the timeout will not terminate the search, but will stop expensive operations?

And ... how does the timeout affect the scrolling API?


(Mark Harwood) #10

Aggregations are conducted by tapping off results from the search's collection stream. A timeout will terminate this operation on the shard in question returning any interim results it may have gathered so far along with a flag to indicate the timeout status. This is reported in the final results.


Kibana Query Timeout - Cancel in-process query
(Chakrayadavalli) #11

Hi Mark, Any update on the management API to terminate long running queries?


(system) #12