ElasticSearch cluster down due to high memory usage

maulik_trapasiya · July 10, 2023, 3:16am

I have ran some queries on ES, which fetched huge amount of data and due to that Memory utilization reached high and ES cluster went down.
Below is the error that ES-java client has thrown

{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [4031859746/3.7gb], which is larger than the limit of [3865051136/3.5gb], real usage: [4031859168/3.7gb], new bytes reserved: [578/578b], usages [model_inference=0/0b, inflight_requests=47776/46.6kb, request=1290403896/1.2gb, fielddata=340509/332.5kb, eql_sequence=0/0b]","bytes_wanted":4031859746,"bytes_limit":3865051136,"durability":"TRANSIENT"}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [4031859746/3.7gb], which is larger than the limit of [3865051136/3.5gb], real usage: [4031859168/3.7gb], new bytes reserved: [578/578b], usages [model_inference=0/0b, inflight_requests=47776/46.6kb, request=1290403896/1.2gb, fielddata=340509/332.5kb, eql_sequence=0/0b]","bytes_wanted":4031859746,"bytes_limit":3865051136,"durability":"TRANSIENT"},"status":429}

in ES node found below logs for this where GC not able to bring memory usage down

[2023-07-06T07:44:51,220][WARN ][o.e.m.j.JvmGcMonitorService] [es-1.com] [gc][9226168] overhead, spent [563ms] collecting in the last [1s]
[2023-07-06T07:50:43,768][WARN ][o.e.m.j.JvmGcMonitorService] [es-1.com] [gc][9226520] overhead, spent [852ms] collecting in the last [1.1s]
[2023-07-06T07:50:44,368][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [es-1.com] attempting to trigger G1GC due to high heap usage [3870694016]
[2023-07-06T07:50:44,375][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [es-1.com] GC did not bring memory usage down, before [3870694016], after [3885846280], allocations [1], duration [7]
[2023-07-06T07:50:45,153][WARN ][o.e.m.j.JvmGcMonitorService] [es-1.com] [gc][9226521] overhead, spent [1.2s] collecting in the last [1.4s]
[2023-07-06T07:50:46,291][WARN ][o.e.m.j.JvmGcMonitorService] [es-1.com] [gc][9226522] overhead, spent [1.1s] collecting in the last [1.1s]
[2023-07-06T07:51:21,814][WARN ][o.e.t.ThreadPool         ] [es-1.com] timer thread slept for [30.2s/30283ms] on absolute clock which is above the warn threshold of [5000ms]

There are 3 VM nodes in ES cluster and each node is having all roles. My ES clusters node roles are like this:

cdfhilmrstw
cdfhilmrstw
cdfhilmrstw

Not able to understand why ES VM went down due to this? is there any configuration through which instead of shut-down, VM get restarted for such scenarios? so that it don't impact production traffic

carly.richmond · July 10, 2023, 12:32pm

Hi @maulik_trapasiya,

It looks like this part of the logs answers your question as to why ES went down. We would recommend splitting larger requests into smaller ones to prevent this happening.

In terms of having Elasticsearch auto-restart, it depends on how you running it. Am I correct in assuming you are running Elasticsearch in a VM?

system · August 7, 2023, 12:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
This query brought down my cluster :{"query": {"function_score": {"query": {"match_all":{}}, "random_score": {"seed": 123456}}}} Elasticsearch	2	717	July 6, 2017
Single node takes down entire cluster Elasticsearch	5	2311	July 6, 2017
Long running GC, cluster status RED, only few GB's data Elasticsearch	12	2576	July 5, 2017
ES_HEAP_SIZE above 32g? Elasticsearch	7	2851	July 5, 2017
Memory issue in ES Elasticsearch	3	538	July 5, 2017

ElasticSearch cluster down due to high memory usage

Related topics