I've been looking into CAP recently and wanted to develop my understanding
of the various tradeoffs and failure modes of Elasticsearch as a
distributed system.
I came across this post from a while back in which Kimchy (Shay) suggests
that ES gives up on partition tolerance, i.e. it chooses Consistency and
Availability out of CAP:
http://elasticsearch-users.115913.n3.nabble.com/CAP-theorem-td891925.html#a894234
From my understanding it seems like Kimchy was confused here. As a
distributed system ES can't give up on the P - you can't will
network/communication failures out of existent!
Instead, it seems like ES mostly compromises on the A (availability) part
of CAP. For example, unless you are willing to suffer potential split-brain
scenarios, setting min master nodes to n/2 + 1 will mean the smaller group
under a network partition will become unavailable (it will not respond to
read/writes). If you do allow split-brain then clearly consistency is
compromised and the client service will need to have some kind of conflict
resolution mechanism.
There are, of course, lots more nuances here.
It would be great if there were a page on the ES site/guide which went into
these issues in more detail as it is (IMO) essential information in
understanding how ES works and in deciding whether it is appropriate for
your use case. Ideally this page would give a general overview of ES
architecture:
- replication behaviour
- how requests are routed (and which nodes can handle requests)
- how index operations are handled
- how get and search requests are handled
- how ES deals with background tasks and resource contention
(Some of this information exists on the site, but it is scattered about and
in any case not very detailed from what I can find.)
By providing this information, it could then discuss:
- how ES approaches consistency, cases where data inconsistency can arise
(for example, what happens if two clients simultaneously update a piece of
content?, etc.), and it's approach to conflict resolution - how ES approaches availability
I'm currently working on writing a blog post on these issues. If it ends up
sufficiently detailed (and turns out accurate enough!) I'd be happy for it
to be added to the docs.
But it would be incredibly useful for someone knowledgeable to either check
what I write, or produce something themselves and is it frankly surprising
I can't find this information anywhere myself (without trawling the web for
scattered pieces of information, and also going through the code/testing a
live system).
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b88fc6ac-024c-4a66-a95f-b1fd86a686e4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.