We are going to have elastic cloud in our corporate data center which we trying to set up.
The question is about data being consistent on different servers – we want to make sure that the data is consistent when different users request it – i.e. if someone requests data during the update they don’t get partially updated data. Or in another scenario if someone requests the data and it goes to one shard and another requests comes after that it might go to another shard which has not been updated yet and has stale data.
That documentation is for an antiquated version of Elasticsearch, you should definitively not install Elasticsearch 2.x for your company but one of the modern 6.x versions where shard synchronization is much better handled. If I remember correctly an update to a primary shard won't return OK before all the replicas have also been updated.
The primary instructs the active master to remove the IDs of the divergent shard copies from the in-sync set. The primary then only acknowledges the write request to the client after it has received confirmation from the master that the in-sync set has been successfully updated by the consensus layer. This ensures that only shard copies that contain all acknowledged writes can be selected as primary by the master.
So you can safely assume that searches against a replica shard will return the same results as if the search hit the primary shard, they contain the same writes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.