Hi Evan,
As Jorg said (though I wouldn't make the replica count == node count a
golden rule), if you have 2 copies of your data it means that you are
resilient to one failure at the time. If another failure occurs while you
are still recovering from the first, bad things may happen. That said, I'm
not sure loosing data is explainable but what you described.
When you have 10 shards, each with 1 copy it means you have 20 shards total
to spread around the cluster. Node C should have some shards assigned to
it. When A crashed, ES starts to compensate for the lost extra copies by
replicating shards from B to C (and maybe from C to B as well).
When ES starts to copy shards from one node to another, the nodes on the
target node (C in this case) are marked as initializing. Only once all data
is copied are they marked as started and can accept new writes. What should
have happened here is that C becomes master but the index (and cluster
becomes RED), this is because there is no active shard in one of the shard
groups. At that point no writes are possible to that shard group.
Obviously this is not what happened to you. Do you have any information
recorded from the problematic time? logs, cluster state, marvel data etc.
Cheers,
Boaz
On Friday, October 24, 2014 6:59:19 PM UTC+2, Evan Tahler wrote:
Interesting @Jörg
How many nodes would you need then to not replicate all data on all
nodes? A highly-touted feature of ES is the ability to share and spread
data across nodes. Any recommendations?
--
Evan Tahler | evantahler@gmail.com | 412.897.6361
evantahler.com | actionherojs.com
On Fri, Oct 24, 2014 at 7:05 AM, Jörg Prante joergprante@gmail.com
wrote:
If you have replica level 1 with 3 nodes, this is not enough. You must
set replica level 2. With replica level 1 and outage of 2 nodes, as you
describe, you will lose data.
Jörg
On Wednesday, October 15, 2014 12:52:31 AM UTC+2, Evan Tahler wrote:
Hi Mailing List! I'm a first-time poster, and a long time reader.
We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused
us to loose a significant volume of data. I have a "theory" on what
happened to cause this, and I would love to hear your opinions on this, and
if you have any suggestions to mitigate it.
Here is a simplified play-by-play:
- Cluster has 3 data nodes, A, B, and C. The index has 10 shards.
The index has a replica count of 1, so A is the master and B is a replica.
C is doing nothing. Re-allocation of indexes/shards is enabled.
- A crashes. B takes over as master, and then starts transferring
data to C as a new replica.
- B crashes. C is now master with an impartial dataset.
- There is a write to the index.
- A and B finally reboot, and they are told that they are now stale
(as C had a write while they were away). Both A and B delete their local
data. A is chosen to be the new replica and re-sync from C.
- ... all the data A and B had which C never got is lost forever.
Is the above situation scenario possible? If it is, it seems like the
default behavior of ES might be better to not reallocate in this scenario?
This would have caused the write in step #4 to fail, but in our use case,
that is preferable to data loss.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/M17mgdZnikk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f7341384-4c88-4e10-a731-f1e6792d6bdd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f7341384-4c88-4e10-a731-f1e6792d6bdd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb442e63-6553-482a-a9cf-5fb3e2146995%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.