Writing data on Replica Shards is copied from Primary shard?

Hi

I want to understand if when writing data on the Replica Shards is the data being copied from the Primary Shard itself or is the data provided again from the Master node? I understand that Elasticsearch may want to wait first for the action on the Primary shard to complete successfully before moving to the Replica shards, but is the data itself copied from the Primary Shard or from the Master node?

Thank you in advance

The current master node does not participate in indexing or querying flows as it just is there to manage the cluster state. Indexing requests are send to the appropriate node holding the primary shard, where it is processed before the data is replicated to the replica shards.

Thank you for the swift reply.
Any chance that you can point me to the classes responsible for copying the newly written data that was first written on the primary shard and now is to be copied from the primary shard to the replica shards?
I am trying to understand if in case the node where the primary shard is being written to is a malicious one {writing false data intentionally}, is there an option to prevent the wrong data from being written to the replica shards as well?

How would this happen? Not sure I understand the rationale behind this scenario.

If the Elasticsearch is installed on various servers, not all of them are mine. If for some reason the owner of the server, which hosts one of the slave nodes. A modified "bad" version of the slave (data) node can be running and when it is chosen to be the primary shard for some data, he decides to ruin that data to hurt the system for some reason by changing some detail. If this is done on the initial writing of the primary shard, this ruined data will also propagate to the replica shards {since it comes from the primary shard and not from the master node to the different slave node (primary and replica shards) together.
Can you please point me to the area in the code where the data is passed from the primary shard to the replica shards?

See the org.elasticsearch.index.engine and org.elasticsearch.index.shard packages. It is nontrivial.

Elasticsearch is not designed to straddle trust boundaries like this.

1 Like

Why would you choose to deploy Elasticsearch this way??

These days there are more community based projects, which may need to reliably store data and retrieve it safely in a distributed system on its members machines. For this to work certain aspects of potential maliciousness should be addressed. This is why I was asking about Elasticsearch.

Any plans to add support to it in future versions?

I will check the packages that you mentioned (thank you again)

I was looking at "src/main/java/org/elasticsearch/action/support/replication/ReplicationOperation.java" and I see they run in "execute()"-
"primary.perform(request)" and if successful "performOnReplicas(replicaRequest,globalCheckpoint,maxSeqNoOfUpdatesOrDeletes,replicationGroup);"
Both of these actions are invoked from the node with the primary shard and not from the master node?

No.

The master node is normally not at all involved in indexing, unless there is a need for coordination (e.g. a shard failure or a mapping update).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.