ElasticSearch dropping data, and not joining after split-brain

Marcin · April 12, 2013, 3:20pm

Hi All,

We have a cluster of 2 ElasticSearch nodes operating on production and
handling data for various public transport information systems. It's
working for about a year now, but from time to time one of the nodes turns
yellow and is missing data, which causes serious issues for the clients and
passangers (the requests that go to the yellow node are getting no results
in response). In such case we restart the nodes and reindex which fixes the
problem. But still, this is happening by average once per 2 weeks, so I
wanted to ask for some help.

There also seems to be a problem with the logger, as for some days the log
files are missing, which makes it more difficult to diagnose issues.

What we already found is that the nodes sometimes seem to be so busy, that
pings between them are getting timeouts. Also from time to time one of the
nodes removes the other from it's view (3 ping timeouts in a row), but
usually they join shortly. However sometimes the nodes can't join, one of
them has state yellow, and is

Sample logs are following:

node 0, 08/04/2013 - https://gist.github.com/anonymous/5b1266482680a1f6d469
node 0, 09/04/2013 - missing
node 1, 08/04/2013 - https://gist.github.com/anonymous/5e45f693428f646af2d7
node 1, 09/04/2013 - https://gist.github.com/anonymous/1587b09e2cd94b369004

We'd be thankful for any help,

Thanks
Marcin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul_Smith · April 16, 2013, 3:35am

With only 2 nodes, you have no way to defend against a split brain. Using
the minimum_master_nodes property, and a minimum of 3-nodes (odd-numbered
required) you can defend against this case:

though I still personally believe there is a problem with Zen and it's
master election, see Issue #2117 and #2488, you're at least likely to be
covered better than you are using minimum_master_nodes.

Paul

On 13 April 2013 01:20, Marcin marcinsulski@gmail.com wrote:

Hi All,

We have a cluster of 2 Elasticsearch nodes operating on production and
handling data for various public transport information systems. It's
working for about a year now, but from time to time one of the nodes turns
yellow and is missing data, which causes serious issues for the clients and
passangers (the requests that go to the yellow node are getting no results
in response). In such case we restart the nodes and reindex which fixes the
problem. But still, this is happening by average once per 2 weeks, so I
wanted to ask for some help.

There also seems to be a problem with the logger, as for some days the log
files are missing, which makes it more difficult to diagnose issues.

What we already found is that the nodes sometimes seem to be so busy, that
pings between them are getting timeouts. Also from time to time one of the
nodes removes the other from it's view (3 ping timeouts in a row), but
usually they join shortly. However sometimes the nodes can't join, one of
them has state yellow, and is

Sample logs are following:

node 0, 08/04/2013 -
node 0: 08-04-2013 · GitHub
node 0, 09/04/2013 - missing
node 1, 08/04/2013 -
node 1: 08-04-2013 · GitHub
node 1, 09/04/2013 -
node 1: 09-04-2013 · GitHub

We'd be thankful for any help,

Thanks
Marcin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Igor_Motov · April 16, 2013, 6:28pm

Are you monitoring CPU and heap on the nodes? What happens to CPU and java
heap just before nodes turn yellow?

On Monday, April 15, 2013 11:35:10 PM UTC-4, tallpsmith wrote:

With only 2 nodes, you have no way to defend against a split brain. Using
the minimum_master_nodes property, and a minimum of 3-nodes (odd-numbered
required) you can defend against this case:

Elasticsearch Platform — Find real-time answers at scale | Elastic

though I still personally believe there is a problem with Zen and it's
master election, see Issue #2117 and #2488, you're at least likely to be
covered better than you are using minimum_master_nodes.

Paul

On 13 April 2013 01:20, Marcin <marcin...@gmail.com <javascript:>> wrote:

Hi All,

We have a cluster of 2 Elasticsearch nodes operating on production and
handling data for various public transport information systems. It's
working for about a year now, but from time to time one of the nodes turns
yellow and is missing data, which causes serious issues for the clients and
passangers (the requests that go to the yellow node are getting no results
in response). In such case we restart the nodes and reindex which fixes the
problem. But still, this is happening by average once per 2 weeks, so I
wanted to ask for some help.

There also seems to be a problem with the logger, as for some days the
log files are missing, which makes it more difficult to diagnose issues.

What we already found is that the nodes sometimes seem to be so busy,
that pings between them are getting timeouts. Also from time to time one of
the nodes removes the other from it's view (3 ping timeouts in a row), but
usually they join shortly. However sometimes the nodes can't join, one of
them has state yellow, and is

Sample logs are following:

node 0, 08/04/2013 -
node 0: 08-04-2013 · GitHub
node 0, 09/04/2013 - missing
node 1, 08/04/2013 -
node 1: 08-04-2013 · GitHub
node 1, 09/04/2013 -
node 1: 09-04-2013 · GitHub

We'd be thankful for any help,

Thanks
Marcin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Node not join the cluster so what happen about the data? Elasticsearch	4	345	July 6, 2017
Data loss after network failure Elasticsearch	2	371	July 6, 2017
3 Node Cluster With Nodes Out of Sync Elasticsearch	8	1988	July 6, 2017
Lost data in ElasticSearch cluster after disconnected node Elasticsearch	6	590	July 6, 2017
Cluster Split Brain Elasticsearch	5	745	July 6, 2017

ElasticSearch dropping data, and not joining after split-brain

Related topics