Writing data on Replica Shards is copied from Primary shard?

TomerA · February 4, 2019, 8:40am

Hi

I want to understand if when writing data on the Replica Shards is the data being copied from the Primary Shard itself or is the data provided again from the Master node? I understand that Elasticsearch may want to wait first for the action on the Primary shard to complete successfully before moving to the Replica shards, but is the data itself copied from the Primary Shard or from the Master node?

Thank you in advance

Christian_Dahlqvist · February 4, 2019, 8:47am

The current master node does not participate in indexing or querying flows as it just is there to manage the cluster state. Indexing requests are send to the appropriate node holding the primary shard, where it is processed before the data is replicated to the replica shards.

TomerA · February 4, 2019, 8:56am

Thank you for the swift reply.
Any chance that you can point me to the classes responsible for copying the newly written data that was first written on the primary shard and now is to be copied from the primary shard to the replica shards?
I am trying to understand if in case the node where the primary shard is being written to is a malicious one {writing false data intentionally}, is there an option to prevent the wrong data from being written to the replica shards as well?

Christian_Dahlqvist · February 4, 2019, 9:00am

How would this happen? Not sure I understand the rationale behind this scenario.

TomerA · February 4, 2019, 9:13am

If the Elasticsearch is installed on various servers, not all of them are mine. If for some reason the owner of the server, which hosts one of the slave nodes. A modified "bad" version of the slave (data) node can be running and when it is chosen to be the primary shard for some data, he decides to ruin that data to hurt the system for some reason by changing some detail. If this is done on the initial writing of the primary shard, this ruined data will also propagate to the replica shards {since it comes from the primary shard and not from the master node to the different slave node (primary and replica shards) together.
Can you please point me to the area in the code where the data is passed from the primary shard to the replica shards?

DavidTurner · February 4, 2019, 9:31am

See the org.elasticsearch.index.engine and org.elasticsearch.index.shard packages. It is nontrivial.

Elasticsearch is not designed to straddle trust boundaries like this.

Christian_Dahlqvist · February 4, 2019, 9:31am

Why would you choose to deploy Elasticsearch this way??

TomerA · February 4, 2019, 10:01am

These days there are more community based projects, which may need to reliably store data and retrieve it safely in a distributed system on its members machines. For this to work certain aspects of potential maliciousness should be addressed. This is why I was asking about Elasticsearch.

Any plans to add support to it in future versions?

I will check the packages that you mentioned (thank you again)

I was looking at "src/main/java/org/elasticsearch/action/support/replication/ReplicationOperation.java" and I see they run in "execute()"-
"primary.perform(request)" and if successful "performOnReplicas(replicaRequest,globalCheckpoint,maxSeqNoOfUpdatesOrDeletes,replicationGroup);"
Both of these actions are invoked from the node with the primary shard and not from the master node?

DavidTurner · February 4, 2019, 10:54am

No.

The master node is normally not at all involved in indexing, unless there is a need for coordination (e.g. a shard failure or a mapping update).

system · March 4, 2019, 10:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Writing data in Elasticsearch index using Data Node (Having respective shard) directly Elasticsearch	2	697	July 5, 2017
Does Elasticsearch clients try to request to the node that has the primary shard of a specific doc? Elasticsearch	4	170	January 29, 2024
Question on replica shards Elasticsearch	1	309	February 18, 2022
Is there a performance issue if all the primary shards are located on a single node? Elasticsearch	2	334	July 27, 2020
Why No shard becomes primary on second node when it is UP again? Elasticsearch	6	643	April 14, 2017

Writing data on Replica Shards is copied from Primary shard?

Related topics