Is it possible to add a customized merging strategy to alleviate split-brain impact?


(Jing Liu) #1

Hi ES team,

When split-brain occurs, I found following behaviors on ES during the merge
between A and B (i.e., a group of nodes with master A or B):
Assume we don't know when the split-brain happens and both node groups have
updated their data to some extends:

  • If A and B have exclusive data separately, all data will be merged
    successfully
  • If A and B have the same record id but different record value (due to
    update), ES cannot merge the data and the system is hanging there (aka.
    split-brain effect)

For the 2nd case, is it possible to add a customized merging strategy in
ES? Say, if having the same record id but different record value, we take
the record with the latest timestamp
.
By this means, I believe we will have less impact from split-brain. Can we
do that? Or will it be added to ES roadmap.

Thanks,
Jing

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a0515c4-d4dc-4062-a306-775317ec646f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jing Liu) #2

Anyone, please?

On Monday, March 31, 2014 11:11:56 AM UTC-7, Jing Liu wrote:

Hi ES team,

When split-brain occurs, I found following behaviors on ES during the
merge between A and B (i.e., a group of nodes with master A or B):
Assume we don't know when the split-brain happens and both node groups
have updated their data to some extends:

  • If A and B have exclusive data separately, all data will be merged
    successfully
  • If A and B have the same record id but different record value (due to
    update), ES cannot merge the data and the system is hanging there (aka.
    split-brain effect)

For the 2nd case, is it possible to add a customized merging strategy in
ES? Say, if having the same record id but different record value, we
take the record with the latest timestamp
.
By this means, I believe we will have less impact from split-brain. Can we
do that? Or will it be added to ES roadmap.

Thanks,
Jing

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9f466dd4-0910-4c5a-a042-f80eab5ecc02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #3

Jing,

I don't have much experience with ES in a production cluster environment;
all my experience has been with the Java API for mapping, bulk load, and
query logic, and with huge databases and things like that. But my 3-node
test ES cluster has gathered some dust over the past few months as other
tasks have loomed (most good; it's just a matter of time and priority). So
your question really intrigued me.

*When split-brain occurs, I found following behaviors on ES during the

merge between A and B (i.e., a group of nodes with master A or B):*
Assume we don't know when the split-brain happens and both node groups
have updated their data to some extends:

- If A and B have exclusive data separately, all data will be merged
successfully

- If A and B have the same record id but different record value (due to
update), ES cannot merge the data and the system is hanging there (aka.
split-brain effect)

Are you saying that case 1 is handled automatically?

*For the 2nd case, is it possible to add a customized merging strategy in
ES? Say, if having the same record id but different record value, we take
the record with the latest timestamp. *
By this means, I believe we will have less impact from split-brain. Can
we do that? Or will it be added to ES roadmap.

I would add a second up-vote to this request.

In the Oracle world of replication, consider two updates, each to the same
record but in a separate node in a replicated cluster. If one update
modifies field A and the other modifies field B, then the most recent
update wins and the previous one's changes are lost. In other words, the
end result of cross-node replication is that either field B's updates are
saved or field A's updates are saved, but not both. Our solution was to
direct all clients to point to one of the Oracle nodes and let replication
flow in only one direction; fail-over means those applications would need
to be re-pointed. Oracle did nothing to help us; it was all up to us.

So your suggestion in the 2nd case makes a lot of sense. No, it's not
perfect. Yes, there can be data loss. Oracle buys palatial headquarters
buildingshttp://media3.s-nbcnews.com/j/MSNBC/Components/Photo/2009/April/090416/090420-sun-oracle-hmed-4p.grid-6x2.jpg,
racing yachtshttp://yachtingworld.media.ipcdigital.co.uk/9097/000007e54/d554/AC34SFJune15-0900.jpg,
and very nice private jetshttp://www.oracleprivatejets.com/images/opjsceptercard.jpgwith their data loss replication, so their replication strategy can't be
all bad! :slight_smile: As with the recent additions to the version types to ES 1.1
with the appropriate warnings, the 2nd case as you describe could be
implemented along with its own warnings about exposure to data loss; an
exposure that a use could work around as needed but with their eyes open.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea91a199-ec5d-4115-b9c9-2457cdab7272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jing Liu) #4

Hi Brain,

Thanks for your inputs.
Yes, the above two cases are found during our tests. Case 1 will be handled
automatically. Hopefully could get attention from ES team for the case 2
solution.

Jing

On Wednesday, April 16, 2014 6:43:14 AM UTC-7, InquiringMind wrote:

Jing,

I don't have much experience with ES in a production cluster environment;
all my experience has been with the Java API for mapping, bulk load, and
query logic, and with huge databases and things like that. But my 3-node
test ES cluster has gathered some dust over the past few months as other
tasks have loomed (most good; it's just a matter of time and priority). So
your question really intrigued me.

*When split-brain occurs, I found following behaviors on ES during the

merge between A and B (i.e., a group of nodes with master A or B):*
Assume we don't know when the split-brain happens and both node groups
have updated their data to some extends:

- If A and B have exclusive data separately, all data will be merged
successfully

- If A and B have the same record id but different record value (due to
update), ES cannot merge the data and the system is hanging there (aka.
split-brain effect)

Are you saying that case 1 is handled automatically?

*For the 2nd case, is it possible to add a customized merging strategy in
ES? Say, if having the same record id but different record value, we take
the record with the latest timestamp. *
By this means, I believe we will have less impact from split-brain. Can
we do that? Or will it be added to ES roadmap.

I would add a second up-vote to this request.

In the Oracle world of replication, consider two updates, each to the same
record but in a separate node in a replicated cluster. If one update
modifies field A and the other modifies field B, then the most recent
update wins and the previous one's changes are lost. In other words, the
end result of cross-node replication is that either field B's updates are
saved or field A's updates are saved, but not both. Our solution was to
direct all clients to point to one of the Oracle nodes and let replication
flow in only one direction; fail-over means those applications would need
to be re-pointed. Oracle did nothing to help us; it was all up to us.

So your suggestion in the 2nd case makes a lot of sense. No, it's not
perfect. Yes, there can be data loss. Oracle buys palatial headquarters
buildingshttp://media3.s-nbcnews.com/j/MSNBC/Components/Photo/2009/April/090416/090420-sun-oracle-hmed-4p.grid-6x2.jpg,
racing yachtshttp://yachtingworld.media.ipcdigital.co.uk/9097/000007e54/d554/AC34SFJune15-0900.jpg,
and very nice private jetshttp://www.oracleprivatejets.com/images/opjsceptercard.jpgwith their data loss replication, so their replication strategy can't be
all bad! :slight_smile: As with the recent additions to the version types to ES
1.1 with the appropriate warnings, the 2nd case as you describe could be
implemented along with its own warnings about exposure to data loss; an
exposure that a use could work around as needed but with their eyes open.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a229260e-bc27-41be-9ed3-91bfa2bc11a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #5

I believe that the Elasticsearch team is more focused on eliminating
split-brain than the after effects of a split brain. Recent comments
indicate that they are actively working on a solution.

The new consensus algorithm (Paxos/RAFT?) will undoubtedly affect how
conflicts are reconciled.

Cheers,

Ivan

On Wed, Apr 16, 2014 at 11:14 AM, Jing Liu jliu@ciphercloud.com wrote:

Hi Brain,

Thanks for your inputs.
Yes, the above two cases are found during our tests. Case 1 will be
handled automatically. Hopefully could get attention from ES team for the
case 2 solution.

Jing

On Wednesday, April 16, 2014 6:43:14 AM UTC-7, InquiringMind wrote:

Jing,

I don't have much experience with ES in a production cluster environment;
all my experience has been with the Java API for mapping, bulk load, and
query logic, and with huge databases and things like that. But my 3-node
test ES cluster has gathered some dust over the past few months as other
tasks have loomed (most good; it's just a matter of time and priority). So
your question really intrigued me.

*When split-brain occurs, I found following behaviors on ES during the

merge between A and B (i.e., a group of nodes with master A or B):*
Assume we don't know when the split-brain happens and both node groups
have updated their data to some extends:

- If A and B have exclusive data separately, all data will be merged
successfully

- If A and B have the same record id but different record value (due to
update), ES cannot merge the data and the system is hanging there (aka.
split-brain effect)

Are you saying that case 1 is handled automatically?

*For the 2nd case, is it possible to add a customized merging strategy
in ES? Say, if having the same record id but different record value, we
take the record with the latest timestamp. *
By this means, I believe we will have less impact from split-brain. Can
we do that? Or will it be added to ES roadmap.

I would add a second up-vote to this request.

In the Oracle world of replication, consider two updates, each to the
same record but in a separate node in a replicated cluster. If one update
modifies field A and the other modifies field B, then the most recent
update wins and the previous one's changes are lost. In other words, the
end result of cross-node replication is that either field B's updates are
saved or field A's updates are saved, but not both. Our solution was to
direct all clients to point to one of the Oracle nodes and let replication
flow in only one direction; fail-over means those applications would need
to be re-pointed. Oracle did nothing to help us; it was all up to us.

So your suggestion in the 2nd case makes a lot of sense. No, it's not
perfect. Yes, there can be data loss. Oracle buys palatial headquarters
buildingshttp://media3.s-nbcnews.com/j/MSNBC/Components/Photo/2009/April/090416/090420-sun-oracle-hmed-4p.grid-6x2.jpg,
racing yachtshttp://yachtingworld.media.ipcdigital.co.uk/9097/000007e54/d554/AC34SFJune15-0900.jpg,
and very nice private jetshttp://www.oracleprivatejets.com/images/opjsceptercard.jpgwith their data loss replication, so their replication strategy can't be
all bad! :slight_smile: As with the recent additions to the version types to ES
1.1 with the appropriate warnings, the 2nd case as you describe could be
implemented along with its own warnings about exposure to data loss; an
exposure that a use could work around as needed but with their eyes open.

Brian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a229260e-bc27-41be-9ed3-91bfa2bc11a3%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a229260e-bc27-41be-9ed3-91bfa2bc11a3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAkrZXjmaaVnZeGEs06XCHAwnqrbguxFyGyr%3DdNhS%3DY8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jing Liu) #6

Thanks Ivan for your response.
Is it possible to know when the new solution will come out? ES 1.2?

Thanks,
Jing

On Wednesday, April 16, 2014 4:30:15 PM UTC-7, Ivan Brusic wrote:

I believe that the Elasticsearch team is more focused on eliminating
split-brain than the after effects of a split brain. Recent comments
indicate that they are actively working on a solution.

The new consensus algorithm (Paxos/RAFT?) will undoubtedly affect how
conflicts are reconciled.

Cheers,

Ivan

On Wed, Apr 16, 2014 at 11:14 AM, Jing Liu <jl...@ciphercloud.com<javascript:>

wrote:

Hi Brain,

Thanks for your inputs.
Yes, the above two cases are found during our tests. Case 1 will be
handled automatically. Hopefully could get attention from ES team for the
case 2 solution.

Jing

On Wednesday, April 16, 2014 6:43:14 AM UTC-7, InquiringMind wrote:

Jing,

I don't have much experience with ES in a production cluster
environment; all my experience has been with the Java API for mapping, bulk
load, and query logic, and with huge databases and things like that. But my
3-node test ES cluster has gathered some dust over the past few months as
other tasks have loomed (most good; it's just a matter of time and
priority). So your question really intrigued me.

*When split-brain occurs, I found following behaviors on ES during the

merge between A and B (i.e., a group of nodes with master A or B):*
Assume we don't know when the split-brain happens and both node groups
have updated their data to some extends:

- If A and B have exclusive data separately, all data will be merged
successfully

- If A and B have the same record id but different record value (due
to update), ES cannot merge the data and the system is hanging there (aka.
split-brain effect)

Are you saying that case 1 is handled automatically?

*For the 2nd case, is it possible to add a customized merging strategy
in ES? Say, if having the same record id but different record value, we
take the record with the latest timestamp. *
By this means, I believe we will have less impact from split-brain.
Can we do that? Or will it be added to ES roadmap.

I would add a second up-vote to this request.

In the Oracle world of replication, consider two updates, each to the
same record but in a separate node in a replicated cluster. If one update
modifies field A and the other modifies field B, then the most recent
update wins and the previous one's changes are lost. In other words, the
end result of cross-node replication is that either field B's updates are
saved or field A's updates are saved, but not both. Our solution was to
direct all clients to point to one of the Oracle nodes and let replication
flow in only one direction; fail-over means those applications would need
to be re-pointed. Oracle did nothing to help us; it was all up to us.

So your suggestion in the 2nd case makes a lot of sense. No, it's not
perfect. Yes, there can be data loss. Oracle buys palatial headquarters
buildingshttp://media3.s-nbcnews.com/j/MSNBC/Components/Photo/2009/April/090416/090420-sun-oracle-hmed-4p.grid-6x2.jpg,
racing yachtshttp://yachtingworld.media.ipcdigital.co.uk/9097/000007e54/d554/AC34SFJune15-0900.jpg,
and very nice private jetshttp://www.oracleprivatejets.com/images/opjsceptercard.jpgwith their data loss replication, so their replication strategy can't be
all bad! :slight_smile: As with the recent additions to the version types to ES
1.1 with the appropriate warnings, the 2nd case as you describe could be
implemented along with its own warnings about exposure to data loss; an
exposure that a use could work around as needed but with their eyes open.

Brian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a229260e-bc27-41be-9ed3-91bfa2bc11a3%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a229260e-bc27-41be-9ed3-91bfa2bc11a3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2986ab9-d853-44b2-bb41-22991bdee2c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #7

I have no idea, but here is a recent comment:

--
Ivan

On Wed, Apr 16, 2014 at 4:41 PM, Jing Liu jliu@ciphercloud.com wrote:

Thanks Ivan for your response.
Is it possible to know when the new solution will come out? ES 1.2?

Thanks,
Jing

On Wednesday, April 16, 2014 4:30:15 PM UTC-7, Ivan Brusic wrote:

I believe that the Elasticsearch team is more focused on eliminating
split-brain than the after effects of a split brain. Recent comments
indicate that they are actively working on a solution.

The new consensus algorithm (Paxos/RAFT?) will undoubtedly affect how
conflicts are reconciled.

Cheers,

Ivan

On Wed, Apr 16, 2014 at 11:14 AM, Jing Liu jl...@ciphercloud.com wrote:

Hi Brain,

Thanks for your inputs.
Yes, the above two cases are found during our tests. Case 1 will be
handled automatically. Hopefully could get attention from ES team for the
case 2 solution.

Jing

On Wednesday, April 16, 2014 6:43:14 AM UTC-7, InquiringMind wrote:

Jing,

I don't have much experience with ES in a production cluster
environment; all my experience has been with the Java API for mapping, bulk
load, and query logic, and with huge databases and things like that. But my
3-node test ES cluster has gathered some dust over the past few months as
other tasks have loomed (most good; it's just a matter of time and
priority). So your question really intrigued me.

*When split-brain occurs, I found following behaviors on ES during the

merge between A and B (i.e., a group of nodes with master A or B):*
Assume we don't know when the split-brain happens and both node
groups have updated their data to some extends:

- If A and B have exclusive data separately, all data will be merged
successfully

- If A and B have the same record id but different record value (due
to update), ES cannot merge the data and the system is hanging there (aka.
split-brain effect)

Are you saying that case 1 is handled automatically?

*For the 2nd case, is it possible to add a customized merging strategy
in ES? Say, if having the same record id but different record value, we
take the record with the latest timestamp. *
By this means, I believe we will have less impact from split-brain.
Can we do that? Or will it be added to ES roadmap.

I would add a second up-vote to this request.

In the Oracle world of replication, consider two updates, each to the
same record but in a separate node in a replicated cluster. If one update
modifies field A and the other modifies field B, then the most recent
update wins and the previous one's changes are lost. In other words, the
end result of cross-node replication is that either field B's updates are
saved or field A's updates are saved, but not both. Our solution was to
direct all clients to point to one of the Oracle nodes and let replication
flow in only one direction; fail-over means those applications would need
to be re-pointed. Oracle did nothing to help us; it was all up to us.

So your suggestion in the 2nd case makes a lot of sense. No, it's not
perfect. Yes, there can be data loss. Oracle buys palatial
headquarters buildingshttp://media3.s-nbcnews.com/j/MSNBC/Components/Photo/2009/April/090416/090420-sun-oracle-hmed-4p.grid-6x2.jpg,
racing yachtshttp://yachtingworld.media.ipcdigital.co.uk/9097/000007e54/d554/AC34SFJune15-0900.jpg,
and very nice private jetshttp://www.oracleprivatejets.com/images/opjsceptercard.jpgwith their data loss replication, so their replication strategy can't be
all bad! :slight_smile: As with the recent additions to the version types to ES
1.1 with the appropriate warnings, the 2nd case as you describe could be
implemented along with its own warnings about exposure to data loss; an
exposure that a use could work around as needed but with their eyes open.

Brian

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a229260e-bc27-41be-9ed3-91bfa2bc11a3%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a229260e-bc27-41be-9ed3-91bfa2bc11a3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b2986ab9-d853-44b2-bb41-22991bdee2c9%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b2986ab9-d853-44b2-bb41-22991bdee2c9%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%2Bfq6CDvk8WJU8thHJhfYurMohe%2B3QQ3pJWUBJ92ESXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #8