I'm trying to perform a Terms Aggregation on an analyzed field (top words used by authors) and I'm hitting CircuitBreakerExceptions. I understand why, and I understand that I can't use doc values here. So the next step is to increase the heap size (and probably get some more memory). So my questions are:
I understand that I can learn about how much memory my field data is consuming using the Node Stats API. However, since my query CircuitBreaker-ed, I'm guessing the entire data set wasn't loaded into field data memory. So, is there a way to figure out how much memory in total would be required by a field to be stored in memory so I can plan hardware requirements?
Any other suggestions for how we could perform something similar to a Terms Aggregation on a large analyzed field?