I had generic question on Elasticsearch replica. Is the replica main purpose is to provide HA to the docs which are stored in Elastic or any other benefits. My questions is suppose we are querying a index with 1M of data after replica factor 1 , the docs count will be increase 2M. While docs search is replicas playing any role i:e now the docs count increased to 2M (with replica 1) so is there any impact on query search.
Is there any way we can validate the above theory.
Replica shards do provide HA but are also used for searching, so if you have a single index with 1 primary shard and one replica shard 2 nodes can serve queries against that index and searched will by default be spread across both shards. Each search will hit one of the shards, not both. Data does however need to be indexed into both shards, so the indexing load on the cluster will increase with the number of replicas.
Thanks @Christian_Dahlqvist for quick response. Is there any way in elastic we can see it practically because customer had POC and I want to show them how actually it works.
This guide is very old, but the concepts around shards is still accurate. I think this is a good starting point for understanding about shards.
Showing high availability is relatively easy as you just need to take down a node that holds a copy of the shard (assuming you have at least 3 master eligible nodes). The only way to show distributed search load would likely be through monitoring statistics.
Hi @Christian_Dahlqvist for your response. One thing I noticed that when I am updated replica to one for 6 indices whose replica previously zero. The Elasticsearch perfroming the this conversion activity one by one index not all at a time parallelly. Is this the expected behavior from Elasticsearch.
I am using below command to check. GET _tasks?detailed=true&actions=*recovery
There is a indeed limit to the number of concurrent recoveries so all will not happen at once. This is in place to ensure the disks and network is not overwhelmed and that the cluster can continue serving traffic.
Hello @Christian_Dahlqvist , does HA need a minimum of 3 nodes? With 2 nodes is there a chance of split brain which is why its non-recommended or HA simply wont work? By HA, we mean, capability to ingest and serve all data to queries while one node is down AND automatically recover once the node is up.
@Christian_Dahlqvist , Debasis meant, changing the index setting replica from 0 to 1? You mentioned "recoveries" which to us meant the recovery that would take place when a down node comes up re-joins the cluster? Are both same internally?
If yes, then we will go through the settings that you shared. But if not, it so happened that we set replica=0 for fast ingest. There were 180 indexes. Subsequently, we set replica to 1. There were no user queries hitting the cluster at that time, there was ample free CPU & IO bandwidth. We wished that the replica creation went much faster. If it would create replicas for one index at a time it would be relatively slower. And that is where we were looking for a way for parallel replica creation.
Yes. Elasticsearch requires a quorum (strict majority, >50% of master eligible nodes available, not >= 50%) of master eligible nodes to be available in order to elect a master (which is required for full cluster operations). If you only have 2 nodes you need both available but with 3 you can elect a master with just 2 of the nodes available.
You can temporarily increase it under some circumstances, but avoid leaving it at higher settings as it can lead to cluster instability and a lot of unnecessary shard movements.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.