I am getting the below error on a shard assignment.
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [internal:index/shard/recovery/translog_ops] would be [28701173408/26.7gb], which is larger than the limit of [28306518835/26.3gb], real usage: [28700173096/26.7gb], new bytes reserved: [1000312/976.8kb], usages [fielddata=17644219506/16.4gb, eql_sequence=0/0b, model_inference=0/0b, inflight_requests=1461110/1.3mb, request=0/0b]
But the size of the shard is only few hundred MB. Why is it complaining about tripping limit in GB?
I am encountering an issue where the cluster is attempting to assign all the backup shards to one single node out of about 90. This is a very strange balancing algo. Digging further, I found out that other nodes I tried manually assigning the shard to, I got this error.
We have just added 10 more data nodes so it is moving shards now. I’ll wait for it to finish before investigating further if the issue persist since it’s a production cluster.
My gut feeling is that the node’s are having too much data and it tripped the circuit breaker’s limit on how much data can be stored in RAM.
Is that particular circuit breaker designed for the purpose I described above? I guess that was kind of what I was originally asking.
That’s the only way the error message would make sense (guessing here. trying to get clarity). The text description seems to suggest the data being moved is too large, which clearly is not the case.
It looks like you may have high heap pressure and the stats I requested may help point to possible causes. If you are monitoring heap usage, look at it to see if you spot any patterns across the nodes in GC frequency and average levels.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.