I have been searching the documentation, stackoverflow and this google
group for some answers about GET API and consistency but there are still
some details I am not sure to understand correctly.
Here are the things I think I understood, don’t hesitate to correct me if I
am wrong:
a GET immediately following a PUT (with sync enabled) will always return
the same document thanks to the transaction log. This is true even if the
GET has the default ”random” preference. (we suppose no other process write
at the same time)
even with a QUORUM consistency, a write operation in “sync” mode will
always send the new doc to ALL replica and wait for their answers. QUORUM
only changes the number of successful replications needed.
if a replica is down and comes back, it will have to synchronise with
the other nodes before having the right to answer requests.
What I don’t understand is the “how” this all works in case of a short node
failure.
Let’s take a simplified example:
3 nodes: A, B and C
1 shard with node A as primary node, B and C being replica
1 single threaded client
the client PUT a doc in sync mode and QUORUM consistency.
the request is redirected to node A where it is written.
the doc is replicated to node B.
node C does not respond and fails to replicate (due for example to
garbage collection)
as quorum is satisfied A returns a success
garbage collector finishes its job on node C. It can be contacted again.
Once the answer from node A is received the client performs a GET of the
document with default (random) preference
Here are the questions:
what happens between steps 4 and 5? Is node C unallocated immediately,
before answering to the write request?
what happens between steps 6 and 7? The problem was very short and node C
did not stop. Is it possible that node C does not realise it failed some
requests and continue to answer client requests?
Do official client library detect that a node has been unallocated before
sending a request?
What happens if a client does not check unallocated nodes in step 7 and
sends the GET request directly to node C?
What happens if in step 7 the client sends the GET request to node B (not
the primary one)? Does it know that B has been unallocated? if not, can the
request be redirected to node C (as the preference is random)?
What happens if in step 7 the client sends the GET request to node A
(primary shard)? (just to be sure)
I have been using Elasticsearch for a few months now and you guys have done
a really great job. Thank you for your hard work. I have not experienced
the problems I described here, those are just scary things I imagined after
reading the doc. Maybe these corner cases have already been explained. If
so, I apologise.
The reason I am concerned is this sentence in an ongoing issue description
found
at http://www.elasticsearch.org/guide/en/elasticsearch/resiliency/current/ :
"If a network partition separates a node from the master, there is some
window of time before the node detects it. This window is extremely small
if a socket is broken. More adversarial partitions, for example, silently
dropping requests without breaking the socket can take longer (up to 3x30s
using current defaults)"
The client I use is elasticsearch.js. Its documentation says that it will
round-robin the requests on its connections and I would like to know if it
can successfully send the GET request to an unallocated node.
If this can happen, what are the recommendations to prevent this situation?
sending request only to non-data nodes?
I have been searching the documentation, stackoverflow and this google
group for some answers about GET API and consistency but there are still
some details I am not sure to understand correctly.
Here are the things I think I understood, don’t hesitate to correct me if
I am wrong:
a GET immediately following a PUT (with sync enabled) will always
return the same document thanks to the transaction log. This is true even
if the GET has the default ”random” preference. (we suppose no other
process write at the same time)
even with a QUORUM consistency, a write operation in “sync” mode will
always send the new doc to ALL replica and wait for their answers. QUORUM
only changes the number of successful replications needed.
if a replica is down and comes back, it will have to synchronise with
the other nodes before having the right to answer requests.
What I don’t understand is the “how” this all works in case of a short
node failure.
Let’s take a simplified example:
3 nodes: A, B and C
1 shard with node A as primary node, B and C being replica
1 single threaded client
the client PUT a doc in sync mode and QUORUM consistency.
the request is redirected to node A where it is written.
the doc is replicated to node B.
node C does not respond and fails to replicate (due for example to
garbage collection)
as quorum is satisfied A returns a success
garbage collector finishes its job on node C. It can be contacted again.
Once the answer from node A is received the client performs a GET of
the document with default (random) preference
Here are the questions:
what happens between steps 4 and 5? Is node C unallocated immediately,
before answering to the write request?
what happens between steps 6 and 7? The problem was very short and node
C did not stop. Is it possible that node C does not realise it failed some
requests and continue to answer client requests?
Do official client library detect that a node has been unallocated
before sending a request?
What happens if a client does not check unallocated nodes in step 7 and
sends the GET request directly to node C?
What happens if in step 7 the client sends the GET request to node B
(not the primary one)? Does it know that B has been unallocated? if not,
can the request be redirected to node C (as the preference is random)?
What happens if in step 7 the client sends the GET request to node A
(primary shard)? (just to be sure)
I have been using Elasticsearch for a few months now and you guys have
done a really great job. Thank you for your hard work. I have not
experienced the problems I described here, those are just scary things I
imagined after reading the doc. Maybe these corner cases have already been
explained. If so, I apologise.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.