Stack details -
ES version - 7.10.1
Kibana version - 7.10.1
Java high level client version - 7.2.0
I have a production cluster of 4 nodes where huge amounts of data is stored in one of the index (approx 30M+ docs, each doc has 10 fields in it). And in one of the apps we run a aggregation query on this data that fetches a lot of buckets in one go.
Usually the number of docs that match the query range between 300K to 1M, and when this is the case, the aggregation works fine. but when the number of docs matching the query goes beyond 20M, this aggregation query just fails silently, does not give any error message (like previously we used to get max_bucket exception, then we raised the limit to 100K and that is not observed again) but instead just returns the default response like if I searched with a query like - GET /{{index_name}}/_search
This is causing a problem in our application, as if it returned a error then application can do something about it, but it just returns some other response.
I want to know what are situations when this can happen that ES will return a default response like that instead of an error.
Following are some more info (dummy) on index and the kind of query we run -
index name - some_index
query -
GET /some_index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"id_1": {
"value": "1258"
}
}
},
{
"terms": {
"status": [
"status1",
"status2"
]
}
}
]
}
},
"aggs": {
"topLevelField": {
"terms": {
"field": "string_field",
"size": 100000
},
"aggs": {
"groupingBy": {
"terms": {
"field": "string_id_field",
"size": 100000
},
"aggs": {
"topHitsDocs": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}
Here the cardinality of top level agg field is around 40K at worst case and that of inner aggregating term string_id_field
is around 30K in worst case scenarios. And in one more nested aggregation we fetch the top hit doc from each bucket.
So I am not able to understand if this query becomes too heavy for ES then it should return an error or something. But instead it seems to return a default response which makes things harder and I can not understand why it would do this? Any insights or help would be appreciated. Thanks.