How to prevent es cluster crash from deep aggregations

(t7s) #1

Recently I have to permit a query that aggs three levels, if data is large then full gc wave is coming then the cluster is gone。

Can es server(not client) abort a request if query time is long?

(Isabel Drost-Fromm) #2

I'm not aware of a server side timeout setting, others may chime in here.

Not sure if you're aware of it already - there are memory based circuit breakers that might help you.

Hope this helps,

(t7s) #3 50%
indices.breaker.fielddata.limit: 40%
indices.fielddata.cache.size: 30%

no effect, still full gc
there is 20 node in my es cluster,256gb memory & 8tb ssd per node,jvm heap size 30gb per node, ES_DIRECT_SIZE 8gb, total data 800gb, mmap index type

I found when the fuck query running the jvm size exceed my setting too much, at least 5gb more than my setting(ES_HEAP_SIZE: 30gb)

es version is 1.5.2

(Mark Walkom) #4

No it cannot. We are going work to allow it though.

(Ivan Brusic) #5

Why are you setting ES_DIRECT_SIZE? Are there other processes on the server?

What version are you on? What type of data are you aggregating on? Numbers or non-analyzed strings? Are doc-values enabled (default in 2.x)? What is the cardinality of the field you are aggregating on?

Tons of questions, but perhaps something might come up.


(t7s) #6

I found when the fuck query running the rss used exceed my setting too
much, at least 5gb more than my setting(ES_HEAP_SIZE: 30gb). although I use BufferPoolMXBean to check direct memory size less than 500mb. By the way I found that if you set ES_DIRECT_SIZE, the cluster restart time decrease.

My cluster is on 1.5.2, non-analyzed field, not cardinality but sum

(Mark Harwood) #7

I can't see your query but one thing you might want to think about is the choice between depth-first and breadth-first aggregation modes if there's a lot of interim state generated which needs to be pruned. See

(Ivan Brusic) #8

And what exactly is a fuck query? :smile:

Try enabling doc_values on the fields that are aggregated on, but that would require a reindex. However, the ES_DIRECT_SIZE directive might be conflicting. Very few people, if any, should play around with ES_DIRECT_SIZE unless you were on a shared server.


(t7s) #10

Thank you

(t7s) #11

Get all three levels grouped result. Not me but my colleague. I know it's awful so I don't ask solution just need aborting request.

Setting ES_DIRECT_SIZE is just an attempt,removed already

(system) #12