What we are seeing is out ES 7.5 data nodes tripping "PERMANENT" circuit breakers after running a few fairly simple queries, that the exact same index on ES 6.4 is able to handle without issue.
I've included a bunch of the information we've collected below, but so far as I can tell, GC is just not triggering. Despite us having already applied the recommended G1GC settings (as described in pull requests and referenced in many topics in this forum).
The worst part about these circuit breakers triggering is that 9 times out of 10, we would probably be better off if the node in question simply crashed. In the past, when a circuit breaker tripped during shard recovery, the cluster silently left us with a missing replica until someone noticed and manually triggered a retry. And currently, when these queries trigger a circuit breaker, the affected nodes are just taken out of service for several hours (I'm not sure exactly how, but after this happened yesterday, the nodes eventually come back in). Which not only caused Kibana to crash on startup (with logs mentioning the circuit breakers), but also meant that our queries could still run while some of the nodes were still up, but because nodes were missing, the results were partial.
Configured max memory: 30 GB