For more background, see the previous topic on this issue: Elasticsearch queue issue after upgrading from 8.6.2 to 8.12.1/8.12.2 - #2 by Amos66
Specifically the replies by Amos66.
As described in the linked post, the query issued to Elasticsearch by Grafana contains min_doc_count = 0
on the terms aggregation over log levels.
It appears that since a recent version of Elasticsearch, this query has become excruciatingly slow and will timeout most of the time.
A potential fix has been implemented: Disable parallel collection for terms aggregation with min_doc_count equals to 0 by iverase · Pull Request #106156 · elastic/elasticsearch · GitHub
However, even after upgrading to Elasticsearch 8.13.4, which should include the fix, the query is still as slow as before.
I'm unsure where to start debugging this. If we change the min_doc_count = 1
, the query will succeed within a second instead of 30 seconds.
For reference, here is the query:
{
"size": 0,
"query": {
"bool": {
"filter": [{
"range": {
"@timestamp": {
"gte": 1710050816992,
"lte": 1710051116992,
"format": "epoch_millis"
}
}
}, {
"query_string": {
"analyze_wildcard": true,
"query": "***"
}
}
]
}
},
"aggs": {
"2": {
"terms": {
"field": "***.keyword",
"size": 500,
"order": {
"_key": "asc"
},
"min_doc_count": 0
},
"aggs": {
"3": {
"date_histogram": {
"field": "@timestamp",
"min_doc_count": "0",
"extended_bounds": {
"min": 1710050816992,
"max": 1710051116992
},
"format": "epoch_millis",
"fixed_interval": "1m"
},
"aggs": {}
}
}
}
}
}