Hello, I am learning Elasticsearch basics and I am dealing with an Out of Memory error when performing aggregations with a large number of buckets.
I already know that for aggregations with 10000+ bucket I should use composite aggregation, but sometimes this cannot be done (e.g. queries auto-generated by Grafana). I don't understand why ES allows me to do a query that crashes it, and do not stop me beforehand.
I crafted a simple example.
I create a foo-index with a single document:
POST foo_index/foo_type/1
{
"ts": "2018-10-20T10:00:00Z",
"value": 10
}
Then I perform a very heavy aggregation query on it:
{
"query": {
"bool": {
"filter": {
"range": {
"ts": {
"gte": "1980-01-01T00:00:00Z",
"lte": "2019-01-01T00:00:00Z"
}
}
}
}
},
"aggs": {
"by_ts": {
"date_histogram": {
"field": "ts",
"interval": "10s",
"extended_bounds": {
"min": "1980-01-01T00:00:00Z",
"max": "2020-01-01T00:00:00Z"
}
},
"aggs": {
"avg_value": {
"avg": {
"field": "value"
}
}
}
}
}
}
After few seconds, the JVM starts heavy garbage collection:
[2018-09-11T17:44:59,949][WARN ][o.e.m.j.JvmGcMonitorService] [AmQ_BYj] [gc][70]
overhead, spent [2.4s] collecting in the last [2.6s]
[2018-09-11T17:45:15,822][WARN ][o.e.m.j.JvmGcMonitorService] [AmQ_BYj] [gc][71]
overhead, spent [14.2s] collecting in the last [15.8s]
And after I while, it crashes with a Java Heap OOM.
Can anybody explain me why ES do not protect itself from this situation, for instance using a circuit breaker?
Edit: I tried ES 6.4.0 (Windows exe and Linux Docker), ES 6.3.1 (Linux Docker) with the same results.