Elasticsearch circuit break when snapshot is being taken

We have a 10 node elasticsearch cluster (7.2.0) with the following config

-Xms8gb -- (16gb RAM)
-Xmx8gb
-XX:+UseG1GC
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30

What we are observing is, occassionally the circuit starts tripping right at the time when our snapshot cron runs (there's a good amount of spike in outgoing network activity as well)

Clearly from the graphs, the heap usage indeed goes upto 7.9gb. We haven't tweaked any circuit settings

We did try reducing the snapshot_bytes_per_sec, but no luck. We also tried reducing the snapshot frequency down to 1hr but that also hasn't helped

We do plan to increase the heap size, unfortunately we are already at 50% usage & this is something elasticseach doesnt recommend. The other option is vertical scaling our instance which is only going to waste our resources unfortunately because this is a low traffic cluster

What is the output of the cluster stats API? Which version are you using?

I don't have cluster stats as of now. we use 7.2.0 elasticsearch

Without the cluster stats I have limited insights into the state of the cluster so I will have a look once you can provide these.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.