Large visualization query stops cluster

Hi to all. We have been attempting to run a visualization query against a section of our indexes for about a week now and every time we run the visualization, Kibana gets an unknown error and all of the Data Nodes appear to get booted from the cluster. Currently, now we have 5 data nodes with 24 gigs of ram in each with 12 to elastic.

This is a first for us, as this query worked fine in Elastic 5.x but once we upgraded to 6.8 it stopped working. Any thoughts or suggestions would be great.
Here is the query

"title": "Report",
"type": "table",
"params": {
"perPage": 10,
"showPartialRows": false,
"showMetricsAtAllLevels": false,
"sort": {
"columnIndex": null,
"direction": null
},
"showTotal": false,
"totalFunc": "sum"
},
"aggs": [
{
"id": "1",
"enabled": true,
"type": "count",
"schema": "metric",
"params": {}
},
{
"id": "2",
"enabled": true,
"type": "terms",
"schema": "bucket",
"params": {
"field": "source_ip.keyword",
"size": 25000,
"order": "desc",
"orderBy": "1",
"otherBucket": false,
"otherBucketLabel": "Other",
"missingBucket": false,
"missingBucketLabel": "Missing",
"customLabel": "SIP"
}
},
{
"id": "3",
"enabled": true,
"type": "terms",
"schema": "bucket",
"params": {
"field": "destination_ip.keyword",
"size": 5000,
"order": "desc",
"orderBy": "1",
"otherBucket": false,
"otherBucketLabel": "Other",
"missingBucket": false,
"missingBucketLabel": "Missing",
"customLabel": "DIP"
}
},
{
"id": "4",
"enabled": true,
"type": "terms",
"schema": "bucket",
"params": {
"field": "event_type.keyword",
"size": 10,
"order": "desc",
"orderBy": "1",
"otherBucket": false,
"otherBucketLabel": "Other",
"missingBucket": false,
"missingBucketLabel": "Missing",
"customLabel": "EVENT"
}
},
{
"id": "5",
"enabled": true,
"type": "terms",
"schema": "bucket",
"params": {
"field": "geoip.city_name.keyword",
"size": 1,
"order": "desc",
"orderBy": "1",
"otherBucket": false,
"otherBucketLabel": "Other",
"missingBucket": false,
"missingBucketLabel": "Missing",
"customLabel": "CITY"
}
},
{
"id": "6",
"enabled": true,
"type": "terms",
"schema": "bucket",
"params": {
"field": "geoip.country_name.keyword",
"size": 1,
"order": "desc",
"orderBy": "1",
"otherBucket": false,
"otherBucketLabel": "Other",
"missingBucket": false,
"missingBucketLabel": "Missing",
"customLabel": "COUNTRY"
}
},
{
"id": "7",
"enabled": true,
"type": "terms",
"schema": "bucket",
"params": {
"field": "geoip.region_name.keyword",
"size": 1,
"order": "desc",
"orderBy": "1",
"otherBucket": false,
"otherBucketLabel": "Other",
"missingBucket": false,
"missingBucketLabel": "Missing",
"customLabel": "REGION"
}
}
]
}

I think you are running into: https://github.com/elastic/elasticsearch/pull/43091

Basically, Lucene changed how docvalues work internally, moving to a model where the iterators are not cached. For very large, nested aggregations (like yours) this means Lucene has to create many more iterators and it slows down considerably. The linked PR should make it better, although it'll never be as fast as 5.x

The query should still work, it'll just take longer than before and you'll probably have to increase the Kibana timeouts.

That said, the aggregation is pretty "abusive" and not recommended. It's highly nested with very large sizes (25000, 5000, etc) which can create millions of buckets and is liable to cause memory problems. If you need such a large aggregation for reporting, we'd recommend using the composite aggregation instead, which is designed to "page" over the aggregation results. This is much more memory friendly.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.