High cardinality multi bucket terms aggregation - huge memory consumption

Hi all,
I' m pretty new to the elastic stack but I'm facing an issue which I don't understand.
I tried to reproduce it with the easiest example:

My documents look like this:

{
    "a1" : "3abcdefghi",
    "a2" : "3abcdefghi_a",
    "dummyA" : "dmihnibdpu"
}

The index contains 1M of these documents, with a1 and a2 having 10K different values and dummyA having 1M different values.

The index size is around 120MB.

The index mapping looks as follows:

{
  "mapping": {
    "_doc": {
      "dynamic_templates": [
        {
          "strings": {
            "match_mapping_type": "string",
            "mapping": {
              "fields": {
                "query": {
                  "type": "text"
                }
              },
              "type": "keyword"
            }
          }
        }
      ],
      "properties": {
        "a1": {
          "type": "keyword",
          "fields": {
            "query": {
              "type": "text"
            }
          }
        },
        "a2": {
          "type": "keyword",
          "fields": {
            "query": {
              "type": "text"
            }
          }
        },
        "dummyA": {
          "type": "keyword",
          "fields": {
            "query": {
              "type": "text"
            }
          }
        }
      }
    }
  }
}

Elasticsearch runs with 7GB heap.

The following query is not executable since it would need more than the 7GB heap memory (parent circuit breaker kicks in):

GET /test/_search
{
  "size": 0,
  "aggs": {
    "a1": {
      "terms": {
        "field": "a1",
        "size": 10000
      },
      "aggs": {
        "a2": {
          "terms": {
            "field": "a2",
            "size": 10000
          }
        }
      }
    }
  }
}

If I use execution_hint: map like so:

GET /test/_search
{
  "size": 0,
  "aggs": {
    "a1": {
      "terms": {
        "field": "a1",
        "size": 10000,
        "execution_hint": "map"
      },
      "aggs": {
        "a2": {
          "terms": {
            "field": "a2",
            "size": 10000,
            "execution_hint": "map"
          }
        }
      }
    }
  }
}

the query returns in around 1s with the result.

As far as I understood, execution_hint: map should actually lead to larger requests and hence consume more memory?

The size of the requests is indeed larger but the overall memory consumption with execution_hint map is much lower than the memory consumption with execution_hint global_ordinals, which is from my understanding the default.

Does anyone know and can explain this huge memory consumption in the default mode for an index that has only 120MB?

Any help would be very cool.

Thanks :slight_smile:

2 Likes

I have the same problem and also cannot explain it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.