Monitoring for field data circuit breaker

We have an ES cluster that is having issues where it is hitting the field data breaker over and over after running with out problems for weeks. Clearing the cache appears to tempoarly resolve the issue but then it comes back again.

 [FIELDDATA] New used memory 12843300526 [11.9gb] from field [timeStamp] would be larger than configured breaker: 12843063705 [11.9gb], breaking 

The cluster has 4 data nodes, 7 indices with 1.05 TB of data & 4.2 billion documents.

  • Elasticsearch 1.7.5
  • Java

Currently using the defaults for field data breaker settings.

I want to know when the breaker is being approached so I can flush the cache, or at least know immediately when it's happening. I think I can watch localhost:9200/_nodes/stats/breaker's breakers/fielddata/tripped to see when it's > 0 but was wondering if anyone could give me a leading indicator instead of having to wait for the situation to occur?

I tried to set indices.fielddata.cache.size on these nodes but it had a massive impact on query performance. Not sure how to address this the "right" way.

You cannot predict this because it is based on the queries coming in.

You can monitor for when it occurs though...both knowing immediately when it happens and being able to see the same from a historical point of view will help troubleshoot the problem.