Parent data too large?

linkerc · February 26, 2026, 5:33pm

I am getting the below error on a shard assignment.

org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [internal:index/shard/recovery/translog_ops] would be [28701173408/26.7gb], which is larger than the limit of [28306518835/26.3gb], real usage: [28700173096/26.7gb], new bytes reserved: [1000312/976.8kb], usages [fielddata=17644219506/16.4gb, eql_sequence=0/0b, model_inference=0/0b, inflight_requests=1461110/1.3mb, request=0/0b]

But the size of the shard is only few hundred MB. Why is it complaining about tripping limit in GB?

I am encountering an issue where the cluster is attempting to assign all the backup shards to one single node out of about 90. This is a very strange balancing algo. Digging further, I found out that other nodes I tried manually assigning the shard to, I got this error.

Any idea?

Christian_Dahlqvist · February 26, 2026, 8:30pm

It is hard to speculate without more information.

What is the full output of the cluster stats API?

linkerc · February 26, 2026, 8:58pm

We have just added 10 more data nodes so it is moving shards now. I’ll wait for it to finish before investigating further if the issue persist since it’s a production cluster.

My gut feeling is that the node’s are having too much data and it tripped the circuit breaker’s limit on how much data can be stored in RAM.

Is that particular circuit breaker designed for the purpose I described above? I guess that was kind of what I was originally asking.

That’s the only way the error message would make sense (guessing here. trying to get clarity). The text description seems to suggest the data being moved is too large, which clearly is not the case.

Thanks.

Christian_Dahlqvist · February 27, 2026, 12:55pm

It looks like you may have high heap pressure and the stats I requested may help point to possible causes. If you are monitoring heap usage, look at it to see if you spot any patterns across the nodes in GC frequency and average levels.

linkerc · March 3, 2026, 7:31pm

The cluster seems fine now. I ended up deleting some old indices and adding new data nodes.

The observed behavior stopped for now.

I’m wondering how much data per data node is recommended?

We are running on AWS m7g.4xlarge (or equivalent with 64GB ram). EBS is at 5TB. We don’t let it go above 80%. Is 5TB too much per node?

Christian_Dahlqvist · March 3, 2026, 8:14pm

There is no easy general answer to this as it depends a lot on the use case specifics, e.g. the data, mappings and the level and mix of load the cluster is under.

linkerc · March 3, 2026, 9:17pm

assuming all indices are active. What is a general guide line?

Is 5TB way too much? About right? Too low (meaning ES can handle much more easily)? Etc.

Christian_Dahlqvist · March 4, 2026, 6:24am

All indices being active is as I stated earlier not enough information. I have seen logging use cases with 5TB of data or more on nodes. I have also seen search heavy use cases with a lot of analysis and advanced features be limited at much lower levels.

The actual version you are on also matters, and I forgot to mention this in the previous reply. Recent versions of Elasticsearch require less overhead per data volume compared to older ones.

RainTown · March 4, 2026, 1:23pm

You are considering a multi-dimensinal issue along just one dimenstion, how much data per node. A few other dimensions are how much new data per node per day, the typical and peak query profiles, your requirements for indexing throughput and query response, your expectations for recovery time if/when a node fails. Plus plenty of others.

To give a sporting analogy, you are asking how fast your strikers should be for a successful football team.

linkerc · March 4, 2026, 7:21pm

Thanks for all the responses. I know it’s hard to give a suggestion without knowing more. But the reality is it’s hard to give precise description to someone who’s not involved as well. Sometimes too much info is counter productive.

I think I got my answer from the responses already. 5TB is within range.

If I say my node contains 5 peta bytes of data, someone would’ve chime in (or with curiosity) why such large size per node and start giving better suggestions, etc.

Just to give more insight into our usage in case someone found this thread in the future with similar question.

We are very write heavy; therefore, my question was purely predicated on relationship of EBS size to write performance on AWS. If reading is causing CPU spike, I can easily determine that and make changes to our cluster (relative to our use case). So my original question already removed the searching dimension.

Thanks again.

Christian_Dahlqvist · March 4, 2026, 7:59pm

As you did not provide any details about the use case I was not aware that the use case was write heavy. When I have in the past sized clusters, especially in older versions of Elasticsearch (you have not specified which version you are on), where some nodes need to support heavy write loads I have generally had these nodes have a lot less data than 5TB. I also try to make sure these types of nodes have very fast storage, ideally ephemeral SSDs on AWS. You have not specified what type of EBS you are using nor whether you have any provisioned IOPS. How far you will get with your current setup will depend on a number of factors, e.g: number of indices and shards actively indexed into, average bulk size, average write size, document size/complexity and what the load profile looks like.

linkerc · March 4, 2026, 9:11pm

SSD (we increased the max throughput to 300 from default of 125), version 8.10. Each node is less than 300 shards.

I keep forgetting about these.

Sorry.

Our cluster mainly hosting monthly indices. Some daily and much fewer hourly (ratio is about 9:2:1).

So every hour, day, and month will have new indices being created and deleted.

Christian_Dahlqvist · March 5, 2026, 10:21am

Is that IOPS you specified? For a write heavy load 300 IOPS is very low and could easily become a bottleneck if it is not already. It may cause operations to take longer than they should, which could in turn result in additional heap pressure.

Of the 300 shards per node, how many are actively written to?

I have never used hourly indices. The reason for this is that it often results in small shards, often less than 10GB in size. If the indexing rate is large enough to result in shard sizes within the recommended range of 30GB to 50GB I have often found that instead increasing the number of primary shards and spread the load out works better. The only reason I would consider using hourly indices is if I had a very short retention period, typically 24-48 hours or less, but I have yet to come across such a scenario.

Topic		Replies	Views
Data too large, data for [<http_request>] would be [4238960092/3.9gb], which is larger than the limit of [3006477107/2.7gb], real usage: [4238959160/3.9gb], new bytes reserved: [932/932b], usages [request=0/0b, inflight_requests=932/932b, model_inference= Elasticsearch	1	66	October 10, 2025
Facing data too large exception frequently Elasticsearch	4	685	November 26, 2020
CircuitBreakingException Data too large Elasticsearch	3	627	July 17, 2020
[circuit_breaking_exception] [parent] Data too large Elasticsearch	13	8092	August 27, 2020
Cause and how to avoid "Data too large <http_request>" exception Elasticsearch	4	9845	November 13, 2018

Parent data too large?

Related topics