Elasticsearch cluster crash due to circuit breakdown issue

Hi,

Currently elasticsearch cluster is getting crashed and showing error related to circuit breakdown.

While hitting kibana url I am getting below screen.

While further debugging I found logs regarding circuit breaker,

{"statusCode":500,"error":"Internal Server Error","message":"[parent] Data too large, data for [<http_request>] would be [13903957752/12.9gb], which is larger than the limit of [13891534848/12.9gb], real usage: [13903957752/12.9gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=190582/186.1kb, in_flight_requests=776/776b, accounting=133044177/126.8mb]: [circuit_breaking_exception] [parent] Data too large, data for [<http_request>] would be [13903957752/12.9gb], which is larger than the limit of [13891534848/12.9gb], real usage: [13903957752/12.9gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=190582/186.1kb, in_flight_requests=776/776b, accounting=133044177/126.8mb], with { bytes_wanted=13903957752 & bytes_limit=13891534848 & durability=\"PERMANENT\" }"}

I tried to increase the parent circuit breaker limit from default 70% to 90% using cluster setting api but still facing same issue. I even tried to increase the heap size from 15Gb to 18Gb but after that also I faced same problem. It is happening from a week frequently.

Version using -
Java - openjdk version "1.8.0_252"
Elasticsearch - 7.5.0
Logstash - 7.5.0
File descriptors on each elasticsearch node - 200000

Running three nodes of elasticsearch with one a master node. Two node consist of elasticsearch and logstash on same machine for logstash HA.

Elasticsearch jvm.options file contains with 18G heap size.

-Xms18g
-Xmx18g
## GC configuration
#-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

Jvm memory, reaches 80% - 90% most of time on all nodes.

  • Temporary solution - For now I just restart one of the nodes, and everything goes back to normal. But it happens again anytime. I am facing this issue more than once a day.

Does downgrading/upgrading Elasticsearch will lead to any solution ? Or just need to do changes in jvm.options file will resolve the issue.

Any direction or thoughts would be really helpful.