When I shutdown a node that holds a replica and updates are happening to
the rest of the cluster, then re-start this node, it seems that the entire
replica is being copied again to that node.
Is there a way to make ES just update that node with the updates that
happened while it was down?
I don't believe this is possible. ES does sync replication by default and
then when a replica is down while updates/inserts are coming in, that
replica is simply invalidated and then fully recovered later once it comes
back up.
If a replica recovers from the primary, then the node hosting it is shut
down shortly thereafter, when it comes back up it will only copy the
segments that have changed in the interim period.
However, merges happen independently on the primary and the replica. When a
replica has been running for a long time, its segments will have diverged
from those of the primary, and so more segments need to be copied across.
I don't believe this is possible. ES does sync replication by default and
then when a replica is down while updates/inserts are coming in, that
replica is simply invalidated and then fully recovered later once it comes
back up.
I stand corrected. Clint is right. ES will try to apply only diffs as much
as possible at the segment level. But if your underlying segments have
diverged significantly since the replica node went down, it is likely that
you'll end up with copying a lot more than the diffs (document-wise).
Otherwise, it'll just copy segments that have changed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.