I've got an issue with one visualization. There's a high cardinality field (HCF) I need to check once in a while. My visualization (line chart) shows the TOP 10 instances by unique count. When I run it on the past hour, one of the nodes hits the heap memory limit (circuit breaker) and the shards on that node goes unassigned.
None of the other instances flinch at at all. When I do the same on another cluster the "load" seems to be distributed. The two clusters are running almost identical setup. The one with the issue described above is running on a newer env (OpenJDK 11). I can't put my finger on the problem here.
Everything else runs smoothly, CPU utilization is between 20-30%, heap around 50%, average load around 1, no other visualizations, dashboards, complex aggreagations have similar effect on the cluster.
Is there a way to prevent this behavior (10GB spike in heap)?
ES version: 7.2.1
Cluster has 6 data nodes each with
- 55GB RAM/25GB heap
- 8 core (2.3 GHz)
Index in question (at query time):
- 100M+ documents
- 10 shards (5 primary, 5 replica)
- size 73 GB+
- unique instances ~20K
- HCF 500K+ unique values