I don't have much experience with ES in a production cluster environment;
all my experience has been with the Java API for mapping, bulk load, and
query logic, and with huge databases and things like that. But my 3-node
test ES cluster has gathered some dust over the past few months as other
tasks have loomed (most good; it's just a matter of time and priority). So
your question really intrigued me.
*When split-brain occurs, I found following behaviors on ES during the
merge between A and B (i.e., a group of nodes with master A or B):*
Assume we don't know when the split-brain happens and both node groups
have updated their data to some extends:
- If A and B have exclusive data separately, all data will be merged
- If A and B have the same record id but different record value (due to
update), ES cannot merge the data and the system is hanging there (aka.
Are you saying that case 1 is handled automatically?
*For the 2nd case, is it possible to add a customized merging strategy in
ES? Say, if having the same record id but different record value, we take
the record with the latest timestamp. *
By this means, I believe we will have less impact from split-brain. Can
we do that? Or will it be added to ES roadmap.
I would add a second up-vote to this request.
In the Oracle world of replication, consider two updates, each to the same
record but in a separate node in a replicated cluster. If one update
modifies field A and the other modifies field B, then the most recent
update wins and the previous one's changes are lost. In other words, the
end result of cross-node replication is that either field B's updates are
saved or field A's updates are saved, but not both. Our solution was to
direct all clients to point to one of the Oracle nodes and let replication
flow in only one direction; fail-over means those applications would need
to be re-pointed. Oracle did nothing to help us; it was all up to us.
So your suggestion in the 2nd case makes a lot of sense. No, it's not
perfect. Yes, there can be data loss. Oracle buys palatial headquarters
and very nice private jetshttp://www.oracleprivatejets.com/images/opjsceptercard.jpgwith their data loss replication, so their replication strategy can't be
all bad! As with the recent additions to the version types to ES 1.1
with the appropriate warnings, the 2nd case as you describe could be
implemented along with its own warnings about exposure to data loss; an
exposure that a use could work around as needed but with their eyes open.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea91a199-ec5d-4115-b9c9-2457cdab7272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.