Hi there! We recently upgraded our elasticsearch to 6.2.3 and have since occasionally run into a circuit breaker, which seems to take some or all data nodes out of our cluster.
Error message:
org.elasticsearch.transport.RemoteTransportException: [elastic8-dus][ip:9300][indices:data/read/search[can_match]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data
for [ < transport_request > ] would be[9622177878 / 8.9 gb], which is larger than the limit
of[9616569139 / 8.9 gb]
What is surprising to us is that when this happens, nodes disappear from the cluster, from cerebro's point of view. The cluster state remains green but we need to restart the nodes in order for them to become visible in cerebros node list again. Memory usage on the machines is unsuspicious at the time of their disappearance, as are incoming and outgoing traffic, context switches, cpu usage, you name it. They also don't log any exceptions. All we can see in the logs is on the client side, as mentioned above. We have not yet been able to drill this down to a particular trigger/query.