Hi,
We can consistently see data loss in Elasticsearch in the following
scenario-
Create a 2 node cluster with primary and replica.
Add 2 more nodes and keep uploading documents when the 2 nodes are
coming up.
Set cluster.routing.allocation.exclude._ip on the first 2 nodes so
that all shards relocate.
As soon as the shard status on the new nodes is STARTED terminate the
first 2 nodes( both primary and replica).
Now intermittently we can see documents uploaded while shard relocation was
in progress, missing. My question is, do we copy the transaction log in
addition to the index during shard relocation? My current guess is that the
transaction log is not copied and documents which are not yet flushed are
lost. Can someone confirm that this hypothesis is correct?
Looks like shard recovery has 5 stages one of which is translog. So now the
question becomes what is the correct moment to shutdown the old nodes. Is
terminating old nodes after the shard status on new ones is STARTED
incorrect? Also, cluster health is green. Just the documents go missing.
On Sunday, March 1, 2015 at 1:28:29 PM UTC+5:30, Aadithya C wrote:
Hi,
We can consistently see data loss in Elasticsearch in the following
scenario-
Create a 2 node cluster with primary and replica.
Add 2 more nodes and keep uploading documents when the 2 nodes are
coming up.
Set cluster.routing.allocation.exclude._ip on the first 2 nodes so
that all shards relocate.
As soon as the shard status on the new nodes is STARTED terminate
the first 2 nodes( both primary and replica).
Now intermittently we can see documents uploaded while shard relocation
was in progress, missing. My question is, do we copy the transaction log in
addition to the index during shard relocation? My current guess is that the
transaction log is not copied and documents which are not yet flushed are
lost. Can someone confirm that this hypothesis is correct?
I thought the whole idea of the cluster is that ANYTHING could be shut down
at ANY TIME and with enough replica shards and a quorum maintained at all
times, everything would be OK. So, does this not work with a cluster
smaller than 5 machines/nodes?
On Sunday, March 1, 2015 at 6:24:19 AM UTC-8, Aadithya C wrote:
Looks like shard recovery has 5 stages one of which is translog. So now
the question becomes what is the correct moment to shutdown the old nodes.
Is terminating old nodes after the shard status on new ones is STARTED
incorrect? Also, cluster health is green. Just the documents go missing.
On Sunday, March 1, 2015 at 1:28:29 PM UTC+5:30, Aadithya C wrote:
Hi,
We can consistently see data loss in Elasticsearch in the following
scenario-
Create a 2 node cluster with primary and replica.
Add 2 more nodes and keep uploading documents when the 2 nodes are
coming up.
Set cluster.routing.allocation.exclude._ip on the first 2 nodes so
that all shards relocate.
As soon as the shard status on the new nodes is STARTED terminate
the first 2 nodes( both primary and replica).
Now intermittently we can see documents uploaded while shard relocation
was in progress, missing. My question is, do we copy the transaction log in
addition to the index during shard relocation? My current guess is that the
transaction log is not copied and documents which are not yet flushed are
lost. Can someone confirm that this hypothesis is correct?
This should work and may indicate a bug. To answer your question about the
use of translog: recovery has 3 main phases - the first is copying over
files (if needed, we try to reuse local files). This is the longest phases.
The second phases is to replay all operations done since the beginning of
phase 1 on the source shard. This is done from the translog. The third and
last is to make sure that all operations done during the second phase are
also replayed (this is a safety measure as we start replicating these
operation as soon as phase1 is over).
Aadithya - can you open a github issue with exact reproduction? preferably
code. These things can be very tricky and the devil is in the details.
Cheers,
Boaz
On Sunday, March 1, 2015 at 10:14:13 PM UTC+1, Dennis wrote:
I thought the whole idea of the cluster is that ANYTHING could be shut
down at ANY TIME and with enough replica shards and a quorum maintained at
all times, everything would be OK. So, does this not work with a cluster
smaller than 5 machines/nodes?
On Sunday, March 1, 2015 at 6:24:19 AM UTC-8, Aadithya C wrote:
Looks like shard recovery has 5 stages one of which is translog. So now
the question becomes what is the correct moment to shutdown the old nodes.
Is terminating old nodes after the shard status on new ones is STARTED
incorrect? Also, cluster health is green. Just the documents go missing.
On Sunday, March 1, 2015 at 1:28:29 PM UTC+5:30, Aadithya C wrote:
Hi,
We can consistently see data loss in Elasticsearch in the following
scenario-
Create a 2 node cluster with primary and replica.
Add 2 more nodes and keep uploading documents when the 2 nodes are
coming up.
Set cluster.routing.allocation.exclude._ip on the first 2 nodes so
that all shards relocate.
As soon as the shard status on the new nodes is STARTED terminate
the first 2 nodes( both primary and replica).
Now intermittently we can see documents uploaded while shard relocation
was in progress, missing. My question is, do we copy the transaction log in
addition to the index during shard relocation? My current guess is that the
transaction log is not copied and documents which are not yet flushed are
lost. Can someone confirm that this hypothesis is correct?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.