org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/write/bulk[s][r]]

Cluster always get CircuitBreakingException after update to ES7.13.

-Xms31g
-Xmx31g

14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

We checked and we have no huge query executed agains the indices. The cluster has 5 nodes. Each one has 64gb ram and SSD.
We had no erros on ES 6.8.3

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/write/bulk[s][r]] would be [32252335886/30gb], which is larger than the limit of [31621696716/29.4gb], real usage: [32252318768/30gb], new bytes reserved: [17118/16.7kb], usages [request=0/0b, fielddata=9820231969/9.1gb, in_flight_requests=17118/16.7kb, model_inference=0/0b, accounting=289864524/276.4mb]

I can suggest to check in _nodes/stats jvm metrics to understand how jvm memory is used. The exception you are getting is saying that currently around 30Gb of memory is already in use, and that's why even your small request of 16.7kb trips the circuit breaker.

Looks like you are using real memory circuit breaker, you can try to disable it temporarily which will use the previous accounting method for memory, and see if you are still getting circuit breaking exceptions.

1 Like

Hello, It looks that after we restart each node, for a while the GC works as expected. At some point the Heap reaches 26-27 gb and the errors start appearing. I suspect that something is blocking the GC to work correctly.
We will try to disable it temporary. JVM from a node

"jvm": {
				"timestamp": 1623655124478,
				"uptime_in_millis": 1549586,
				"mem": {
					"heap_used_in_bytes": 24071107312,
					"heap_used_percent": 72,
					"heap_committed_in_bytes": 33285996544,
					"heap_max_in_bytes": 33285996544,
					"non_heap_used_in_bytes": 233079120,
					"non_heap_committed_in_bytes": 239403008,
					"pools": {
						"young": {
							"used_in_bytes": 13555990528,
							"max_in_bytes": 0,
							"peak_used_in_bytes": 19495124992,
							"peak_max_in_bytes": 0
						},
						"old": {
							"used_in_bytes": 10119818752,
							"max_in_bytes": 33285996544,
							"peak_used_in_bytes": 10248005632,
							"peak_max_in_bytes": 33285996544
						},
						"survivor": {
							"used_in_bytes": 395298032,
							"max_in_bytes": 0,
							"peak_used_in_bytes": 1577058304,
							"peak_max_in_bytes": 0
						}
					}
				},
				"threads": {
					"count": 177,
					"peak_count": 245
				},
				"gc": {
					"collectors": {
						"young": {
							"collection_count": 190,
							"collection_time_in_millis": 10204
						},
						"old": {
							"collection_count": 0,
							"collection_time_in_millis": 0
						}
					}
				},
				"buffer_pools": {
					"mapped": {
						"count": 6905,
						"used_in_bytes": 484894438634,
						"total_capacity_in_bytes": 484894438634
					},
					"direct": {
						"count": 192,
						"used_in_bytes": 36719564,
						"total_capacity_in_bytes": 36719563
					},
					"mapped - 'non-volatile memory'": {
						"count": 0,
						"used_in_bytes": 0,
						"total_capacity_in_bytes": 0
					}
				},
				"classes": {
					"current_loaded_count": 25073,
					"total_loaded_count": 25150,
					"total_unloaded_count": 77
				}
			},

All nodes leave the cluster after half day. Horrible update to ES7.13

What do your Elasticsearch logs show?
What is the output from the _cluster/stats?pretty&human API?