ES getting killed by heavy queries

zygisa · March 15, 2018, 4:25pm

Hey guys,

Recently we had a couple of situations where our ES cluster received an influx of heavy queries and that pretty much killed the cluster. CPU utilization reached 100% on all of the nodes in the cluster meanwhile heap/RAM was doing fine. We had a bunch of queries running for more than 300 seconds that we manually killed using task management API and cluster recovered. Obviously, this is not a preferable way of doing with this.

So the question is: is there any circuit breaker (or anything like that) that would kill long running heavy queries after a certain amount of time (or when CPU util reaches certain threshold)? We have circuit breakers to prevent OOM but there's nothing for the CPU utilization as far as I can tell after checking the documentation.

Thanks!

Mark_Harwood · March 15, 2018, 5:40pm

See the search timeout option, which by default is unbounded: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#_parameters_4

zygisa · March 16, 2018, 7:57am

Is there a way to set the timeout on cluster side (config/API call) rather than the client side?

Mark_Harwood · March 16, 2018, 8:45am

Yep: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search.html#global-search-timeout

system · April 13, 2018, 8:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Circuit breaker to prevent ES client from having OOM problem Elasticsearch	5	853	June 6, 2018
Terminate long running queries to protect the system from DoS Elasticsearch	10	14713	July 5, 2017
Stopping long term queries Elasticsearch	3	554	July 5, 2017
How to protect an ES cluster from searches that would kill it? Elasticsearch	6	3147	July 6, 2017
How to handle long running queries Elasticsearch	1	1515	July 17, 2018

ES getting killed by heavy queries

Related topics