Ha

Mohit_Anchlia · August 28, 2013, 2:44am

From what I understand writes always go to prmary shard even though you
have 2+ replica copies. It also looks like there is a concept of promoting
other node in the cluster as primary shard when a node fails. Does it mean
clients could potentially see a downtime while this promition is taking
place?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mohit_Anchlia · August 28, 2013, 11:07pm

Could somebody help me with this question?

On Tue, Aug 27, 2013 at 7:44 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

From what I understand writes always go to prmary shard even though you
have 2+ replica copies. It also looks like there is a concept of promoting
other node in the cluster as primary shard when a node fails. Does it mean
clients could potentially see a downtime while this promition is taking
place?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 29, 2013, 5:39am

Why should there be a downtime? Cluster state changes are promoted and
switched without interruption of service.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mohit_Anchlia · August 29, 2013, 5:49am

The concepts are bit confusing, for instance if there is a concept of
quoram then why is there a concept of primary shard and replica shards? It
appears all of them have same role to play,

On Wed, Aug 28, 2013 at 10:39 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Why should there be a downtime? Cluster state changes are promoted and
switched without interruption of service.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 29, 2013, 7:08am

In fault tolerant systems, you often work with quorums. Quorums give you a
voting system so you can be sure that some level of consistency has been
achieved, but you can trade for speed.

To answer to write requests asynchronously, ES offers levels of
consistency: one (primary only), quorum (at least half of replica +1) and
all (all replica). The primary shard is just the first shard that processes
index modifications, so the consistency level "one" is the fastest, but you
can not be sure about that your data arrived at the replicas. This will
happen in the future. The default consistency level is "quorum", which is a
compromise between speed and safety. It's a matter what view of consistency
the client prefers.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · August 29, 2013, 1:48pm

That is correct, primaries are always written to first. They are the
"point of truth" inside the cluster. After a write is finished on a
primary, it is sent out to the replicas to be replicated.

When a primary shard leaves the cluster (because the node disappears, etc),
a replica shard will be promoted to primary. This happens on a per-shard
basis (e.g. promotion of individual shards, not the whole node) and is
nearly instant. The promotion is really just the master updating the
cluster state and informing the other nodes of the changes. The node will
then mark its replica as master and continue business as usual.

If you do happen to index a document while A) the primary is gone and B)
the replica is in the midst of being promoted...elasticsearch will wait
until the promotion is complete before continuing the indexing request. By
default the request will time-out after a minute if the promotion process
takes longer than that.

Replicas and quorums play an overlapping role. A primary/replica
distinction simplifies the process of deciding who can make decisions about
data ("point of truth"). Write consistency quorums help prevent writing to
the wrong side of a split-brain, where a shard thinks it is the primary but
in reality it is sitting in isolation from the rest of the cluster (and
therefore is no longer the true primary, and should not accept indexing
requests).

It is important to note that write consistency quorums only checks that the
shards are available...it doesn't actually do a quorum consensus on each
new document. The shards don't all talk to each other and discuss if the
data should be indexed or not. If the quorum rule passes, the primary
shard is still the only shard capable of "officially" accepting the data
into the cluster, then responsible for pushing that data out to replicas.

Phew, that reply got kind of long-winded...does it help clear things up?

-Zach

On Tuesday, August 27, 2013 10:44:11 PM UTC-4, Mo wrote:

From what I understand writes always go to prmary shard even though you
have 2+ replica copies. It also looks like there is a concept of promoting
other node in the cluster as primary shard when a node fails. Does it mean
clients could potentially see a downtime while this promition is taking
place?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mohit_Anchlia · August 29, 2013, 7:42pm

Thanks a lot! That's a great explanation. It clears up the confusion. So it
looks like quorum is applicable for entire cluster to make sure that during
network partition majority of nodes in one side of network are receiving
requests.

I have couple more questions:

When a node is down with Replica factor 1 does elasticsearch starts to
create another replica or is it just writing to one shard? Similarly what
happens if node is down for extended period of time.
Once down node comes back up does it sync up data that from the primary
shard?

Overall trying to understand what admin related activities need to happen
during failures and after state is restored or node comes back up.

On Thu, Aug 29, 2013 at 6:48 AM, Zachary Tong zacharyjtong@gmail.comwrote:

That is correct, primaries are always written to first. They are the
"point of truth" inside the cluster. After a write is finished on a
primary, it is sent out to the replicas to be replicated.

When a primary shard leaves the cluster (because the node disappears,
etc), a replica shard will be promoted to primary. This happens on a
per-shard basis (e.g. promotion of individual shards, not the whole node)
and is nearly instant. The promotion is really just the master updating
the cluster state and informing the other nodes of the changes. The node
will then mark its replica as master and continue business as usual.

If you do happen to index a document while A) the primary is gone and B)
the replica is in the midst of being promoted...elasticsearch will wait
until the promotion is complete before continuing the indexing request. By
default the request will time-out after a minute if the promotion process
takes longer than that.

Replicas and quorums play an overlapping role. A primary/replica
distinction simplifies the process of deciding who can make decisions about
data ("point of truth"). Write consistency quorums help prevent writing to
the wrong side of a split-brain, where a shard thinks it is the primary but
in reality it is sitting in isolation from the rest of the cluster (and
therefore is no longer the true primary, and should not accept indexing
requests).

It is important to note that write consistency quorums only checks that
the shards are available...it doesn't actually do a quorum consensus on
each new document. The shards don't all talk to each other and discuss if
the data should be indexed or not. If the quorum rule passes, the primary
shard is still the only shard capable of "officially" accepting the data
into the cluster, then responsible for pushing that data out to replicas.

Phew, that reply got kind of long-winded...does it help clear things up?

-Zach

On Tuesday, August 27, 2013 10:44:11 PM UTC-4, Mo wrote:

From what I understand writes always go to prmary shard even though you
have 2+ replica copies. It also looks like there is a concept of promoting
other node in the cluster as primary shard when a node fails. Does it mean
clients could potentially see a downtime while this promition is taking
place?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · August 29, 2013, 10:23pm

Yeah, write consistency quorum is another safeguard to help prevent
dangerous inconsistent data during split-brains. Theoretically
minimum_master_nodes should prevent a split-brain, but if one does happen,
write consistency can help prevent further damage (not that it is
guaranteed to, however...a quorum of shards may exist on one side of the
split-brain).

If you have 1 replica, and that replica disappears (node dies,
network partition, etc), the default behavior is that Elasticsearch will
recreate the missing replica somewhere. It will stream data from the
primary to the new node and fully rebuild the replica. There are ways to
modify this behavior depending on your architecture, but that's the general
idea.
When the old node comes back online, it gets a fresh copy of the
cluster state from the master. In this particular situation, it sees that
its local replica is now redundant (since it was moved to a different node
while unavailable) so it will delete the local replica. Depending on
architecture and context, it is also possible for Elasticsearch to reuse
the local data. This sometimes entails no syncing, a little syncing or
sometimes a full re-sync because the underlying segments have diverged
considerably.

On Thursday, August 29, 2013 3:42:31 PM UTC-4, Mo wrote:

Thanks a lot! That's a great explanation. It clears up the confusion. So
it looks like quorum is applicable for entire cluster to make sure that
during network partition majority of nodes in one side of network are
receiving requests.

I have couple more questions:

When a node is down with Replica factor 1 does elasticsearch starts to
create another replica or is it just writing to one shard? Similarly what
happens if node is down for extended period of time.

Once down node comes back up does it sync up data that from the primary
shard?

Overall trying to understand what admin related activities need to happen
during failures and after state is restored or node comes back up.

On Thu, Aug 29, 2013 at 6:48 AM, Zachary Tong <zachar...@gmail.com<javascript:>

wrote:

That is correct, primaries are always written to first. They are the
"point of truth" inside the cluster. After a write is finished on a
primary, it is sent out to the replicas to be replicated.

When a primary shard leaves the cluster (because the node disappears,
etc), a replica shard will be promoted to primary. This happens on a
per-shard basis (e.g. promotion of individual shards, not the whole node)
and is nearly instant. The promotion is really just the master updating
the cluster state and informing the other nodes of the changes. The node
will then mark its replica as master and continue business as usual.

If you do happen to index a document while A) the primary is gone and B)
the replica is in the midst of being promoted...elasticsearch will wait
until the promotion is complete before continuing the indexing request. By
default the request will time-out after a minute if the promotion process
takes longer than that.

Replicas and quorums play an overlapping role. A primary/replica
distinction simplifies the process of deciding who can make decisions about
data ("point of truth"). Write consistency quorums help prevent writing to
the wrong side of a split-brain, where a shard thinks it is the primary but
in reality it is sitting in isolation from the rest of the cluster (and
therefore is no longer the true primary, and should not accept indexing
requests).

It is important to note that write consistency quorums only checks that
the shards are available...it doesn't actually do a quorum consensus on
each new document. The shards don't all talk to each other and discuss if
the data should be indexed or not. If the quorum rule passes, the primary
shard is still the only shard capable of "officially" accepting the data
into the cluster, then responsible for pushing that data out to replicas.

Phew, that reply got kind of long-winded...does it help clear things up?

-Zach

On Tuesday, August 27, 2013 10:44:11 PM UTC-4, Mo wrote:

From what I understand writes always go to prmary shard even though you
have 2+ replica copies. It also looks like there is a concept of promoting
other node in the cluster as primary shard when a node fails. Does it mean
clients could potentially see a downtime while this promition is taking
place?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Homogeneous distribution of primary shards Elasticsearch	4	713	July 6, 2017
Elasticsearch primary shards (re)location Elasticsearch	2	386	July 6, 2017
ElasticSearch Multi Data Center replication Elasticsearch	4	372	July 6, 2017
Unassigned Primary and Replica Shard Elasticsearch	1	347	July 6, 2017
Replica Fall Elasticsearch	2	301	July 6, 2017

Ha

Related topics