I have an issue with elasticsearch and HEAP usage during snapshots.
My cluster have 7 nodes (2cpu & 7.5G of RAM per node, local SSD storage), about 220 millions of documents for a total size of 377GB worth of data (reported by Kibana).
I'm running version 5.5.2 (I Know it is EOL but that's what I'm stuck with).
I'm currently implementing hourly snapshots to a s3 bucket.
While everything is working fine, during the snapshot the JVM heap rises over 85%, triggering our monitoring system.
I have set the heap size to 50% of my available RAM according to the manual, so I think configuration is good.
I attached a capture of one node during the snapshot.
I'm strugling to find something to do to prevent such JVP HEAP spikes.
Is it dangerous (can crash the cluster/node)?
What can I do about this?
I've already experimented issues like you but I'm reading about it, I think we need to explicit say to elasticsearch clean jvm after snapshots, but I'm reading about it.
Yes, Ram usage after snapshot returns to normal but what is worrying me is Ram usage when snapshot is launched.
Since it reach > 85%, my fear is that it kills the node or slow down the whole cluster.
I already saw our cluster slowed down to a halt just because of one unhealty node, so I'm not verry confident about this happening.
What are my options to have hourly backups of our cluster if snapshotting causes a risk?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.