Circuit breaker exception

I get the following exception while I am trying to insert data using ES-Hive Hadoop jar. I am currently inserting around 60 million data.

Following is the error I get

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: org.elasticsearch.hadoop.rest.EsHadoopRemoteException: circuit_breaking_exception: [parent] Data too large, data for [<http_request>] would be [31820340712/29.6gb], which is larger than the limit of [31621696716/29.4gb], real usage: [31818275608/29.6gb], new bytes reserved: [2065104/1.9mb], usages [inflight_requests=105523462/100.6mb, request=0/0b, fielddata=0/0b, eql_sequence=0/0b, model_inference=0/0b]

Elasticsearch.log shows only the following:

[2022-05-25T04:13:01,315][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [] GC did bring memory usage down, before [31636006904], after [30392212248], allocations [19], duration [137]
[2022-05-25T04:13:04,065][INFO ][o.e.m.j.JvmGcMonitorService] [] [gc][159765] overhead, spent [300ms] collecting in the last [1s]
[2022-05-25T04:13:06,886][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [] attempting to trigger G1GC due to high heap usage [32059682216]
[2022-05-25T04:13:07,061][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [] GC did bring memory usage down, before [32059682216], after [30969353816], allocations [58], duration [175]
[2022-05-25T04:13:12,337][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [] attempting to trigger G1GC due to high heap usage [31690774104] ```

1. My shard allocation is 1.
2. Replica is 1.
3. JVM size allocated is max at i.e. 31 GB.

Stats below:

name                  id   node.role   heap.current heap.percent heap.max
xxxxx                 xx   xxx                 27.5gb           88     31gb

What can I do to fix it other than adding another node to the cluster.

Reduce the size of your index request is the other alternative.

But I am using hive hadoop jar that uses bulk internally, can you please help in how can I reduce the size of index request in that case.

Also when I am moving data from hive the data size on disk is ~ 35 GB which when moved to Elasticsearch shows disk size of 500GB. Why is this happening. Is it something that is expected from this conversion?

I'm not sure how do that sorry. Hopefully someone else can comment.

Thanks!! Any idea on this?
Also when I am moving data from hive the data size on disk is ~ 35 GB which when moved to Elasticsearch shows disk size of 500GB. Why is this happening. Is it something that is expected from this conversion?

You'd be best off making a new topic for that :slight_smile:

Ok!! Thanks :slight_smile:

I added a new node to the ES cluster with 3 TB space, I am still stuck on the error:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: org.elasticsearch.hadoop.rest.EsHadoopRemoteException: circuit_breaking_exception: [parent] Data too large, data for [<http_request>] would be [31656845952/29.4gb], which is larger than the limit of [31621696716/29.4gb], real usage: [31654767576/29.4gb], new bytes reserved: [2078376/1.9mb], usages [eql_sequence=0/0b, fielddata=32168/31.4kb, request=0/0b, inflight_requests=333681778/318.2mb, model_inference=0/0b]

But at the end of the error it also gives this error:

, Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1004, Vertex vertex_1654243481653_2108_13_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
Closing: 0: jdbc:hive2://datanode0..com:2181,datanode..com:2181,master..com:2181,master010..com:2181,master010..com:81/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 ```


Is this error due to space crunch of my cluster? Or is it due to ES circuit breaker? Do you have any idea on that, cause even after adding new node the error doesn't seem to go away.

It's due to Elasticsearch still. Did you try reducing your request sizes?

How can I do that? As I mentioned I am using es-hadoop jar? Is there a way to do in that jar or set something explicitly?

Also how do you do it in Elasticsearch.. i don't know that either :confused:

You don't do it in Elasticsearch, it's a client level approach. I don't know hadoop though sorry.

In case of ingestion in Elasticsearch can you help with an article may be, how can this be achieved?

Bulk API | Elasticsearch Guide [8.2] | Elastic might help, you need to tell your client to not include so many documents when it sends to Elasticsearch.

I found this documentation from ES-Hadoop jar, can this be helpful, if I try reducing the batch entry size/batch size bytes?

Size (in bytes) for batch writes using Elasticsearch bulk API. Note the bulk size is allocated per task instance. Always multiply by the number of tasks within a Hadoop job to get the total bulk size at runtime hitting Elasticsearch.
es.batch.size.entries (default 1000)
Size (in entries) for batch writes using Elasticsearch bulk API - (0 disables it). Companion to es.batch.size.bytes, once one matches, the batch update is executed. Similar to the size, this setting is per task instance; it gets multiplied at runtime by the total number of Hadoop tasks running.```