Partial index replication causes data loss?

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused us
to loose a significant volume of data. I have a "theory" on what happened
to cause this, and I would love to hear your opinions on this, and if you
have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10 shards. The
    index has a replica count of 1, so A is the master and B is a replica. C
    is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts transferring data
    to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now stale (as
    C had a write while they were away). Both A and B delete their local data.
    A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84ba332b-2e34-4ce4-aaa2-acfa616f3230%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bump? I would love to hear some thoughts on this flow, and if there are
any suggestions on how to mitigate it (other than replicating all data to
all nodes).

Thanks!

On Tuesday, October 14, 2014 3:52:31 PM UTC-7, Evan Tahler wrote:

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused
us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10 shards.
    The index has a replica count of 1, so A is the master and B is a replica.
    C is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts transferring
    data to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now stale
    (as C had a write while they were away). Both A and B delete their local
    data. A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Very interesting. The default 'write consistency level' with Elasticsearch
is QUORUM, i.e. verify a quorum of replicas for a shard are available
before processing a write for it. In this case you were just left with 1
replica, C, and a write happened. So you would think that it should not go
through since 2 replicas would be required for quorum. However:
https://github.com/elasticsearch/elasticsearch/issues/6482. I think this
goes to show this is a real, not a hypothetical problem!

But guess what? *Even if this were fixed, and a write to C never happened: *it
is still possible that once A & B were back, C could be picked as primary
and clobber data. See:

On Thu, Oct 23, 2014 at 7:48 PM, Evan Tahler evantahler@gmail.com wrote:

Bump? I would love to hear some thoughts on this flow, and if there are
any suggestions on how to mitigate it (other than replicating all data to
all nodes).

Thanks!

On Tuesday, October 14, 2014 3:52:31 PM UTC-7, Evan Tahler wrote:

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused
us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10 shards.
    The index has a replica count of 1, so A is the master and B is a replica.
    C is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts transferring
    data to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now stale
    (as C had a write while they were away). Both A and B delete their local
    data. A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DP848XunJ8_pQKYi36uF2Df1UghZVOwS%2BuzABaocmKKJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Interesting!

However, the write may not be the cause of the data loss here. Even if
there was no write while A and B are down, would the recovery process have
happened the same way? In some further tests, it still looks like C would
have overwritten all the data in A and B when they rebooted.

This type of error is easily triggered by garbage collection with large
data sets, and a server becoming unresponsive for too long. (perhaps the
cluster kicks out the unresponsive node, or a supervisor restarts the
application)

On Thursday, October 23, 2014 12:59:00 PM UTC-7, Shikhar Bhushan wrote:

Very interesting. The default 'write consistency level' with Elasticsearch
is QUORUM, i.e. verify a quorum of replicas for a shard are available
before processing a write for it. In this case you were just left with 1
replica, C, and a write happened. So you would think that it should not go
through since 2 replicas would be required for quorum. However:
https://github.com/elasticsearch/elasticsearch/issues/6482. I think this
goes to show this is a real, not a hypothetical problem!

But guess what? *Even if this were fixed, and a write to C never
happened: *it is still possible that once A & B were back, C could be
picked as primary and clobber data. See:
https://github.com/elasticsearch/elasticsearch/issues/7572#issuecomment-59983759

On Thu, Oct 23, 2014 at 7:48 PM, Evan Tahler <evant...@gmail.com
<javascript:>> wrote:

Bump? I would love to hear some thoughts on this flow, and if there are
any suggestions on how to mitigate it (other than replicating all data to
all nodes).

Thanks!

On Tuesday, October 14, 2014 3:52:31 PM UTC-7, Evan Tahler wrote:

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused
us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10 shards.
    The index has a replica count of 1, so A is the master and B is a replica.
    C is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts transferring
    data to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now stale
    (as C had a write while they were away). Both A and B delete their local
    data. A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b3c6605-da27-4119-8f1b-6fdcf43b404d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yes, this is the 2nd issue I mentioned, where ES will pick basically any
replica as primary without consideration to which one might be more
'up-to-date'

On Fri, Oct 24, 2014 at 3:57 AM, Evan Tahler evantahler@gmail.com wrote:

Interesting!

However, the write may not be the cause of the data loss here. Even if
there was no write while A and B are down, would the recovery process have
happened the same way? In some further tests, it still looks like C would
have overwritten all the data in A and B when they rebooted.

This type of error is easily triggered by garbage collection with large
data sets, and a server becoming unresponsive for too long. (perhaps the
cluster kicks out the unresponsive node, or a supervisor restarts the
application)

On Thursday, October 23, 2014 12:59:00 PM UTC-7, Shikhar Bhushan wrote:

Very interesting. The default 'write consistency level' with
Elasticsearch is QUORUM, i.e. verify a quorum of replicas for a shard are
available before processing a write for it. In this case you were just left
with 1 replica, C, and a write happened. So you would think that it should
not go through since 2 replicas would be required for quorum. However:
https://github.com/elasticsearch/elasticsearch/issues/6482. I think this
goes to show this is a real, not a hypothetical problem!

But guess what? *Even if this were fixed, and a write to C never
happened: *it is still possible that once A & B were back, C could be
picked as primary and clobber data. See: https://github.com/
elasticsearch/elasticsearch/issues/7572#issuecomment-59983759

On Thu, Oct 23, 2014 at 7:48 PM, Evan Tahler evant...@gmail.com wrote:

Bump? I would love to hear some thoughts on this flow, and if there are
any suggestions on how to mitigate it (other than replicating all data to
all nodes).

Thanks!

On Tuesday, October 14, 2014 3:52:31 PM UTC-7, Evan Tahler wrote:

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which
caused us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10
    shards. The index has a replica count of 1, so A is the master and B is a
    replica. C is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts transferring
    data to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now
    stale (as C had a write while they were away). Both A and B delete their
    local data. A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5b3c6605-da27-4119-8f1b-6fdcf43b404d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5b3c6605-da27-4119-8f1b-6fdcf43b404d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DPL6Amht_M7dOkZH0izkTAZegB-0awROVwDS35eH-aBaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ahh, thanks for pointing that out!

Lets move this conversation to the Github issue, as I think we can be more
productive there

On Thursday, October 23, 2014 11:06:06 PM UTC-7, Shikhar Bhushan wrote:

Yes, this is the 2nd issue I mentioned, where ES will pick basically any
replica as primary without consideration to which one might be more
'up-to-date'

On Fri, Oct 24, 2014 at 3:57 AM, Evan Tahler <evant...@gmail.com
<javascript:>> wrote:

Interesting!

However, the write may not be the cause of the data loss here. Even if
there was no write while A and B are down, would the recovery process have
happened the same way? In some further tests, it still looks like C would
have overwritten all the data in A and B when they rebooted.

This type of error is easily triggered by garbage collection with large
data sets, and a server becoming unresponsive for too long. (perhaps the
cluster kicks out the unresponsive node, or a supervisor restarts the
application)

On Thursday, October 23, 2014 12:59:00 PM UTC-7, Shikhar Bhushan wrote:

Very interesting. The default 'write consistency level' with
Elasticsearch is QUORUM, i.e. verify a quorum of replicas for a shard are
available before processing a write for it. In this case you were just left
with 1 replica, C, and a write happened. So you would think that it should
not go through since 2 replicas would be required for quorum. However:
https://github.com/elasticsearch/elasticsearch/issues/6482. I think
this goes to show this is a real, not a hypothetical problem!

But guess what? *Even if this were fixed, and a write to C never
happened: *it is still possible that once A & B were back, C could be
picked as primary and clobber data. See: https://github.com/
elasticsearch/elasticsearch/issues/7572#issuecomment-59983759

On Thu, Oct 23, 2014 at 7:48 PM, Evan Tahler evant...@gmail.com wrote:

Bump? I would love to hear some thoughts on this flow, and if there
are any suggestions on how to mitigate it (other than replicating all data
to all nodes).

Thanks!

On Tuesday, October 14, 2014 3:52:31 PM UTC-7, Evan Tahler wrote:

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which
caused us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10
    shards. The index has a replica count of 1, so A is the master and B is a
    replica. C is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts
    transferring data to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now
    stale (as C had a write while they were away). Both A and B delete their
    local data. A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/58e98223-c036-41e2-b53c-265343fa3173%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5b3c6605-da27-4119-8f1b-6fdcf43b404d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5b3c6605-da27-4119-8f1b-6fdcf43b404d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a42853d-0892-4c0d-ab72-9874ee390af9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you have replica level 1 with 3 nodes, this is not enough. You must set
replica level 2. With replica level 1 and outage of 2 nodes, as you
describe, you will lose data.

Jörg

On Wednesday, October 15, 2014 12:52:31 AM UTC+2, Evan Tahler wrote:

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused
us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10 shards.
    The index has a replica count of 1, so A is the master and B is a replica.
    C is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts transferring
    data to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now stale
    (as C had a write while they were away). Both A and B delete their local
    data. A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7341384-4c88-4e10-a731-f1e6792d6bdd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Interesting @Jörg
How many nodes would you need then to not replicate all data on all nodes?
A highly-touted feature of ES is the ability to share and spread data
across nodes. Any recommendations?

--
Evan Tahler | evantahler@gmail.com | 412.897.6361
evantahler.com | actionherojs.com

On Fri, Oct 24, 2014 at 7:05 AM, Jörg Prante joergprante@gmail.com wrote:

If you have replica level 1 with 3 nodes, this is not enough. You must set
replica level 2. With replica level 1 and outage of 2 nodes, as you
describe, you will lose data.

Jörg

On Wednesday, October 15, 2014 12:52:31 AM UTC+2, Evan Tahler wrote:

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused
us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10 shards.
    The index has a replica count of 1, so A is the master and B is a replica.
    C is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts transferring
    data to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now stale
    (as C had a write while they were away). Both A and B delete their local
    data. A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/M17mgdZnikk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f7341384-4c88-4e10-a731-f1e6792d6bdd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f7341384-4c88-4e10-a731-f1e6792d6bdd%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOhjaCCzjcWkVkGTtgw9h%2B1j2wCu1%3D6pOEpFEteH0%2B17F_N9rw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Evan,

As Jorg said (though I wouldn't make the replica count == node count a
golden rule), if you have 2 copies of your data it means that you are
resilient to one failure at the time. If another failure occurs while you
are still recovering from the first, bad things may happen. That said, I'm
not sure loosing data is explainable but what you described.

When you have 10 shards, each with 1 copy it means you have 20 shards total
to spread around the cluster. Node C should have some shards assigned to
it. When A crashed, ES starts to compensate for the lost extra copies by
replicating shards from B to C (and maybe from C to B as well).

When ES starts to copy shards from one node to another, the nodes on the
target node (C in this case) are marked as initializing. Only once all data
is copied are they marked as started and can accept new writes. What should
have happened here is that C becomes master but the index (and cluster
becomes RED), this is because there is no active shard in one of the shard
groups. At that point no writes are possible to that shard group.

Obviously this is not what happened to you. Do you have any information
recorded from the problematic time? logs, cluster state, marvel data etc.

Cheers,
Boaz

On Friday, October 24, 2014 6:59:19 PM UTC+2, Evan Tahler wrote:

Interesting @Jörg
How many nodes would you need then to not replicate all data on all
nodes? A highly-touted feature of ES is the ability to share and spread
data across nodes. Any recommendations?

--
Evan Tahler | evantahler@gmail.com | 412.897.6361
evantahler.com | actionherojs.com

On Fri, Oct 24, 2014 at 7:05 AM, Jörg Prante joergprante@gmail.com
wrote:

If you have replica level 1 with 3 nodes, this is not enough. You must
set replica level 2. With replica level 1 and outage of 2 nodes, as you
describe, you will lose data.

Jörg

On Wednesday, October 15, 2014 12:52:31 AM UTC+2, Evan Tahler wrote:

Hi Mailing List! I'm a first-time poster, and a long time reader.

We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused
us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.

Here is a simplified play-by-play:

  1. Cluster has 3 data nodes, A, B, and C. The index has 10 shards.
    The index has a replica count of 1, so A is the master and B is a replica.
    C is doing nothing. Re-allocation of indexes/shards is enabled.
  2. A crashes. B takes over as master, and then starts transferring
    data to C as a new replica.
  3. B crashes. C is now master with an impartial dataset.
  4. There is a write to the index.
  5. A and B finally reboot, and they are told that they are now stale
    (as C had a write while they were away). Both A and B delete their local
    data. A is chosen to be the new replica and re-sync from C.
  6. ... all the data A and B had which C never got is lost forever.

Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/M17mgdZnikk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f7341384-4c88-4e10-a731-f1e6792d6bdd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f7341384-4c88-4e10-a731-f1e6792d6bdd%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb442e63-6553-482a-a9cf-5fb3e2146995%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.