We are currently running our Elasticsearch cluster in Kubernetes and use EBS gp3 volumes as storage. We are reading everywhere that during initial load of the data one could disable replicas to speed up indexing. In our application we are also updating our documents on a regular basis (e.g monthly).
Disabling replicas for that feels like a little bit of a risk because if the server gets overloaded and some nodes fail we might lose data. However, we were wondering if replicas altogether are needed if you use external volumes like EBS?
Thanks, that makes sense. In our case, we then are considering to temporarily disable replicas during a full update of some fields of the documents, and enable it back again after that. We are also making snapshots constantly. We have to test this though.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.