Term aggregation with _uid field loads all _uid in fielddata => circuitBreakingExcep

iulianlaz · October 21, 2015, 12:05pm

Hello,

Let assume that I have an index (conversation) with 2 types: conversation (60 documents) and message (600 milions documents).

If I made an aggregation with field: _uid, I am not sure if es loads the values from every document in conversation index or just from conversation/conversation (where query is made).

Expectation:
I expect that only _uids from conversation type to be loaded into fielddata, not the ones from message type too.

Actual result:
I receive a circuitBreaker exception for this request, and I assume that message _uids are loaded into fielddata too.

Request:

GET conversation/conversation/_search
{
  "size": 1,
  "aggs": {
    "threads": {
      "terms": {
        "field": "_uid",
        "size": 0
      }
    }
  }
}

Response:

"reason": "RemoteTransportException[[es2.novalocal][inet[/192.168.22.192:9300]][indices:data/read/search[phase/query]]]; nested: ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for field [_uid] would be larger than limit of [10266083328/9.5gb]];

So, my question is: all documents from conversation index are loaded, not the one from conversation type?

Thanks a lot!

Mark_Harwood · October 22, 2015, 3:20pm

This is the old "multiple indexes or multiple-types-in-an-index" question.
From a management perspective (backup, restore, aliasing etc) it might make sense to have a single index but what you are seeing is one of the inefficiencies of storing multiple types in the same index. Some data structures used internally in an index are a function of the number of documents, regardless of type. This then is an argument for breaking the types out into a separate index.

Topic		Replies	Views
Elasticsearch aggregation OOM Elasticsearch	23	4581	February 12, 2017
_field_names too large with CircuitBreakingException Elasticsearch	1	818	July 5, 2017
Help on optimization of aggregation request with fieldData set to true Elasticsearch	10	588	September 17, 2018
IDs query tripping circuit breakers Elasticsearch	7	1550	June 30, 2017
"aggregations" do not work any more (index corrupt ?) - resolved Elasticsearch	10	1850	July 5, 2017

Term aggregation with _uid field loads all _uid in fielddata => circuitBreakingExcep

Related topics