Assume the cumulative Index Size is approximately 200GB.
Is taking snapshot backup every 30 minutes advisable? Will it cause any performance impact on the cluster for any active read/writes happening to the cluster during the snapshot backup process?
Similarly, will the cross-cluster replication cause any performance degradation to the primary cluster's read/writes ??
Assume the cumulative Index Size is approximately 200GB.
Do you mean that the total amount of data in the cluster will be 200GB, or that within the 30-minute window 200GB of data will change requiring a new backup?
Either way, generally speaking, the backup/snapshot process is fairly efficient on newer versions of Elasticsearch, as long as you have a good (fast) network connection to the backup destination, backups generally shouldn't cause many issues related to performance.
The total amount of data in the cluster say is 200 GB.
Within the 30-minute timeframe, assume a data of about 10-15 GB is ingested.
When I configure the snapshots to be taken/backed up to remote repository every 30 minutes, how significant would the impact on the performance of read/writes be during the snapshot backup process? I assume it's a separate single thread running that transports the backup, so I believe it shouldn't really affect the usual read/writes.
(Running on a 2vCPU and 8GiG of ram per node on a 3 node cluster.)
I unfortunately can't really say. I would not expect much of a performance impact, but the only way to know for sure is to actually test it and see what happens.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.