I'm new to the project so apologies if answers to these questions were
covered elsewhere but I couldn't determine exact answers from the
documentation or other discussion topics.
Some general index gateway questions:
- The index gateway docs say that "The second level, in case of a
complete cluster shutdown / failure, is the gateway support" which I
read to mean, if no index gateway is configured and all the replicas
for a partition are stopped, they will not be recoverable. Is this
correct? - If a primary shard fails and a new one is elected, what level of
write consistency is guaranteed for the elected primary? See my
questions below on consistency, but my concern here is data loss
during failover due to write consistency as it relates to the
replication for a partition.
My next questions are replica bootstrap and consistency questions. For
these questions, assume I have a 7 node cluster with 1 index with
three partitions and three replicas. If I understand correctly, the
partition/replication could look something like this (sorry for the
ASCII art):
host-0 | host-1 | host-2 |
host-3 | host-4 | host-5 | host-6
Index0 P0 R0 | Index0 P1 R0 | Index0 P2 R0 | Index 0 P0 R1 | Index 0
P1 R1 | Index 0 P2 R1 | Index 0 P0 R2
Index0 P1 R2 | Index0 P2 R2 |
If this is correct, my questions are:
- If I change the configuration to have a 4th replica for each
partition, how does the system elect where R3 for P0,P1 and P2 are
replicated to? Is it based on load for example, such that host-0 and
host-1 will not be considered as they already own two replicas? - Subsequently, while the new 4th replica is coming online, how does
it source its copy of the data? In other words, how does the new
replica choose which existing replica to pull its copy of the
replicated index from? - If I have a write that would land in P2 from a client connected to
host-0, at what point does the write return success to the client?
After the write has been replicated to all three replicas for P2? One
replica? A quorum of replicas? - If my cluster experiences a partition such that host-0, host-1 and
host-2 cannot see host-3, host-4, host-5 and host-6, I now effectively
have a split quorum - one quorum for P0, and one for P1 and P2
(assuming the partition replication I describe is possible). In this
case, can read succeed for any of the partitions? Can writes? I'm
coming from a Dynamo background where the answer is "it depends" but
I'm curious how ElasticSearch handles these types of partial quorum
partitions.
My last questions deal with sorted result sets. Simply put, I'm
curious how the system handles total ordering across a partitioned
data set. Using the host/partition topology above, if I perform a
search query whose results span P0, P1 and P2, and I issue that query
to host-4, it can only locally sort ~.33 of my total results. Assuming
each host performs a local sort of the results, how is total ordering
achieved before results are sent to the client? Are the ordered
results bounded by the heap of one host's JVM?
Thanks and sorry for the barrage of questions.
-erik