Multiple masters elected during cluster crash - question about data consistency

Hi,

I met on OOME on all the three nodes in my cluster.

After restarting each node from the system shell (concurrently) node 1 was
started as the first one and was elected as a master one. Right after that
the nodes 2 and 3 was started too but they could not see the 1st node had
been started yet and the 2nd node was elected as master also.
Then I saw that my single, three node cluster was splitted into two
instances (one with a single node and the second one with to 2 left nodes).
They started to work independently and two rivers was started concurrently
on both "clusters".

The question is:

If two rivers are working concurrently is there any chance that after
fixing the situation and "merge" broken clusters into a single one all the
data will be available/indexed ? Or my river need to take care of the data
consistency?

Thanks for all of your advices :slight_smile:

Regards,
Marek.

--

It's possible that after "merge" you will end up with two shards that
supposed to have the same, but contain different sets of data because they
were parts of different clusters. It's somewhat of difficult situation to
recover from. The best thing you can do in this case, is to remove one of
the shards (by temporary setting number of replicas to 0, for example) and
then reindex missing records.

If you haven't done this already, I would recommend setting
discovery.zen.minimum_master_nodeshttp://www.elasticsearch.org/guide/reference/modules/discovery/zen.html to
2 (more than a half of master-eligible nodes in your cluster) in order to
prevent such situation from happening in the future.

On Wednesday, November 28, 2012 3:07:48 AM UTC-5, scoro wrote:

Hi,

I met on OOME on all the three nodes in my cluster.

After restarting each node from the system shell (concurrently) node 1 was
started as the first one and was elected as a master one. Right after that
the nodes 2 and 3 was started too but they could not see the 1st node had
been started yet and the 2nd node was elected as master also.
Then I saw that my single, three node cluster was splitted into two
instances (one with a single node and the second one with to 2 left nodes).
They started to work independently and two rivers was started concurrently
on both "clusters".

The question is:

If two rivers are working concurrently is there any chance that after
fixing the situation and "merge" broken clusters into a single one all the
data will be available/indexed ? Or my river need to take care of the data
consistency?

Thanks for all of your advices :slight_smile:

Regards,
Marek.

--

Thank you Igor. That is the answer I was expected :slight_smile:

W dniu czwartek, 29 listopada 2012 02:50:05 UTC+1 użytkownik Igor Motov
napisał:

It's possible that after "merge" you will end up with two shards that
supposed to have the same, but contain different sets of data because they
were parts of different clusters. It's somewhat of difficult situation to
recover from. The best thing you can do in this case, is to remove one of
the shards (by temporary setting number of replicas to 0, for example) and
then reindex missing records.

If you haven't done this already, I would recommend setting
discovery.zen.minimum_master_nodeshttp://www.elasticsearch.org/guide/reference/modules/discovery/zen.html to
2 (more than a half of master-eligible nodes in your cluster) in order to
prevent such situation from happening in the future.

On Wednesday, November 28, 2012 3:07:48 AM UTC-5, scoro wrote:

Hi,

I met on OOME on all the three nodes in my cluster.

After restarting each node from the system shell (concurrently) node 1
was started as the first one and was elected as a master one. Right after
that the nodes 2 and 3 was started too but they could not see the 1st node
had been started yet and the 2nd node was elected as master also.
Then I saw that my single, three node cluster was splitted into two
instances (one with a single node and the second one with to 2 left nodes).
They started to work independently and two rivers was started concurrently
on both "clusters".

The question is:

If two rivers are working concurrently is there any chance that after
fixing the situation and "merge" broken clusters into a single one all the
data will be available/indexed ? Or my river need to take care of the data
consistency?

Thanks for all of your advices :slight_smile:

Regards,
Marek.

--