Field data for analyzed fields - how can I find out how much heap is required?

Hi guys,

I'm trying to perform a Terms Aggregation on an analyzed field (top words used by authors) and I'm hitting CircuitBreakerExceptions. I understand why, and I understand that I can't use doc values here. So the next step is to increase the heap size (and probably get some more memory). So my questions are:

  1. I understand that I can learn about how much memory my field data is consuming using the Node Stats API. However, since my query CircuitBreaker-ed, I'm guessing the entire data set wasn't loaded into field data memory. So, is there a way to figure out how much memory in total would be required by a field to be stored in memory so I can plan hardware requirements?

  2. Any other suggestions for how we could perform something similar to a Terms Aggregation on a large analyzed field?

Thanks guys!