Replica Use In Elasticsearch

Debasis_Mallick · April 18, 2024, 9:16am

Hi Team,

I had generic question on Elasticsearch replica. Is the replica main purpose is to provide HA to the docs which are stored in Elastic or any other benefits. My questions is suppose we are querying a index with 1M of data after replica factor 1 , the docs count will be increase 2M. While docs search is replicas playing any role i:e now the docs count increased to 2M (with replica 1) so is there any impact on query search.

Is there any way we can validate the above theory.

Thanks,
Debasis

Christian_Dahlqvist · April 18, 2024, 9:39am

Replica shards do provide HA but are also used for searching, so if you have a single index with 1 primary shard and one replica shard 2 nodes can serve queries against that index and searched will by default be spread across both shards. Each search will hit one of the shards, not both. Data does however need to be indexed into both shards, so the indexing load on the cluster will increase with the number of replicas.

Debasis_Mallick · April 18, 2024, 9:54am

Thanks @Christian_Dahlqvist for quick response. Is there any way in elastic we can see it practically because customer had POC and I want to show them how actually it works.

Thanks,
Debasis

Christian_Dahlqvist · April 18, 2024, 10:07am

This guide is very old, but the concepts around shards is still accurate. I think this is a good starting point for understanding about shards.

Showing high availability is relatively easy as you just need to take down a node that holds a copy of the shard (assuming you have at least 3 master eligible nodes). The only way to show distributed search load would likely be through monitoring statistics.

Debasis_Mallick · April 20, 2024, 7:42am

Hi @Christian_Dahlqvist for your response. One thing I noticed that when I am updated replica to one for 6 indices whose replica previously zero. The Elasticsearch perfroming the this conversion activity one by one index not all at a time parallelly. Is this the expected behavior from Elasticsearch.

I am using below command to check.
GET _tasks?detailed=true&actions=*recovery

Thanks,
Debasis

Christian_Dahlqvist · April 20, 2024, 7:47am

There is a indeed limit to the number of concurrent recoveries so all will not happen at once. This is in place to ensure the disks and network is not overwhelmed and that the cluster can continue serving traffic.

Debasis_Mallick · April 20, 2024, 8:14am

Thanks @Christian_Dahlqvist . Is there anyway we can get/see those configuration details w.r.t replica creation.

Thanks,
Debasis

Christian_Dahlqvist · April 20, 2024, 8:46am

It is generally not recommended to change these settings as they may affect cluster stability. The settings are however documented here.

pces · April 23, 2024, 8:08am

Hello @Christian_Dahlqvist , does HA need a minimum of 3 nodes? With 2 nodes is there a chance of split brain which is why its non-recommended or HA simply wont work? By HA, we mean, capability to ingest and serve all data to queries while one node is down AND automatically recover once the node is up.

Thanks

pces · April 23, 2024, 8:13am

@Christian_Dahlqvist , Debasis meant, changing the index setting replica from 0 to 1? You mentioned "recoveries" which to us meant the recovery that would take place when a down node comes up re-joins the cluster? Are both same internally?

If yes, then we will go through the settings that you shared. But if not, it so happened that we set replica=0 for fast ingest. There were 180 indexes. Subsequently, we set replica to 1. There were no user queries hitting the cluster at that time, there was ample free CPU & IO bandwidth. We wished that the replica creation went much faster. If it would create replicas for one index at a time it would be relatively slower. And that is where we were looking for a way for parallel replica creation.

Thanks

Christian_Dahlqvist · April 23, 2024, 8:23am

Yes. Elasticsearch requires a quorum (strict majority, >50% of master eligible nodes available, not >= 50%) of master eligible nodes to be available in order to elect a master (which is required for full cluster operations). If you only have 2 nodes you need both available but with 3 you can elect a master with just 2 of the nodes available.

Please see the official documentation for further details.

Yes, I believe it is the same mechanism.

You can temporarily increase it under some circumstances, but avoid leaving it at higher settings as it can lead to cluster instability and a lot of unnecessary shard movements.

Topic		Replies	Views
Newbie question on shard and replicas Elasticsearch	5	412	July 6, 2017
Replica shards are not involving while searching Elasticsearch	7	2564	November 16, 2019
Distributed search, how to work? Elasticsearch	6	1771	April 26, 2019
Are replicas just for disaster recovery? Elasticsearch	6	313	March 9, 2021
Distributed Search / Replication - some questions for a better understanding Elasticsearch	2	324	July 6, 2017

Replica Use In Elasticsearch

Related topics