Term aggregation with _uid field loads all _uid in fielddata => circuitBreakingExcep

Hello,

Let assume that I have an index (conversation) with 2 types: conversation (60 documents) and message (600 milions documents).

If I made an aggregation with field: _uid, I am not sure if es loads the values from every document in conversation index or just from conversation/conversation (where query is made).

Expectation:
I expect that only _uids from conversation type to be loaded into fielddata, not the ones from message type too.

Actual result:
I receive a circuitBreaker exception for this request, and I assume that message _uids are loaded into fielddata too.

Request:

GET conversation/conversation/_search
{
  "size": 1,
  "aggs": {
    "threads": {
      "terms": {
        "field": "_uid",
        "size": 0
      }
    }
  }
}

Response:

"reason": "RemoteTransportException[[es2.novalocal][inet[/192.168.22.192:9300]][indices:data/read/search[phase/query]]]; nested: ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for field [_uid] would be larger than limit of [10266083328/9.5gb]];

So, my question is: all documents from conversation index are loaded, not the one from conversation type?

Thanks a lot!

This is the old "multiple indexes or multiple-types-in-an-index" question.
From a management perspective (backup, restore, aliasing etc) it might make sense to have a single index but what you are seeing is one of the inefficiencies of storing multiple types in the same index. Some data structures used internally in an index are a function of the number of documents, regardless of type. This then is an argument for breaking the types out into a separate index.

1 Like