Terms aggregations in docs with nested objects using a lot of memory

We are running Elasticsearch in a cluster with 1 node, 1 index, 6 shards,
55 million docs. We run queries with terms aggregation in 15 fields and it
works well, taking about 10 seconds to return.

We reindexed the docs in another cluster with 1 node, 1 index, 4 shards and
the same 55 million docs to run some tests. The mapping is a little
different, now having some nested objects. We run the same queries as
before (adapted to use the nested queries and aggregations) but we always
get circuit breaker error because loading the fields to the memory for the
aggregation would take more memory than available.

Both machines have the same configurations (64GB of memory, running ES
with ES_HEAP_SIZE=32g)

I used the node stats api to get some info about the fielddata
(_stats/fielddata?fields=my_field&pretty) in both machines about a field
that didn't have any change in the mapping, existing directly in the root
document (not nested), and I got a huge difference in memory usage:

Machine 1:

{

"_shards" : {
"total" : 8,
"successful" : 4,
"failed" : 0
},
"_all" : {
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 28132578552,
"evictions" : 0,
"fields" : {
"my_field" : {
"memory_size_in_bytes" : 224983649
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 28132578552,
"evictions" : 0,
"fields" : {
"my_field" : {
"memory_size_in_bytes" : 224983649
}
}
}
}
},
"indices" : {
"my_index_1" : {
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 28132578552,
"evictions" : 0,
"fields" : {
"my_field" : {
"memory_size_in_bytes" : 224983649
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 28132578552,
"evictions" : 0,
"fields" : {
"my_field" : {
"memory_size_in_bytes" : 224983649
}
}
}
}
}
}
}

Machine 2:

{

"_shards" : {
"total" : 12,
"successful" : 6,
"failed" : 0
},
"_all" : {
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 6812053739,
"evictions" : 0,
"fields" : {
"my_field" : {
"memory_size_in_bytes" : 62533082
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 6812053739,
"evictions" : 0,
"fields" : {
"my_field" : {
"memory_size_in_bytes" : 62533082
}
}
}
}
},
"indices" : {
"my_index_2" : {
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 6812053739,
"evictions" : 0,
"fields" : {
"my_field" : {
"memory_size_in_bytes" : 62533082
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 6812053739,
"evictions" : 0,
"fields" : {
"my_field" : {
"memory_size_in_bytes" : 62533082
}
}
}
}
}
}
}

While in the old index the field uses 62.5331MB, in the new index it uses
224.984MB. Heavier fields that uses about 1GB in the old index are using
4~6GB in the new index. With the 15 aggregations together, the memory usage
increased to a size that won't fit in the heap.

Does the fact that the document have nested objects change the amount of
memory needed to keep non-nested fields in memory?

I tested using include_in_root in every nested object and doing all my
aggregation directly in the root doc (not using nested aggregations at all)
and still every field uses way more memory than the old index, with the
same data. Can someone explain it? I have no clue

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2532ue8bNrt3391xadCw9HH_gBCSPy5gPY3ds1hTDmnGL-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.