ES's synchronisation is not really anything like 2PC, if only because it has a single phase
If the primary dies at phase 2 then a replica is promoted to primary in its place. The operation was indexed on the now-dead primary but not the replica, nor was it acked to the client.
If the primary dies at phase 4 then a replica is promoted to primary in its place. The other replicas then roll back to an earlier state and recover any missing operations from the new primary so as to be sure that they end up in the same state.
In both cases the client might see an exception, or Elasticsearch might retry the operation on the new primary and return a successful response to the client.
Thank you very much for clarifying the index process.
I've did some researching after post the question and learnt that ES's synchronization is a partial implementation of the PacificA algorithm. Is it correct?
Thus I'd like to confirm an additional state: Primary dies at phase #3. My assumption is:
A new replica is promoted as the new primary, but there is no guarantee that the request was successfully written to it.
Thus the client would receive exception, but the request might or might not be indexed depends on whether the translog was written to the new primary or not.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.