ElasticSearch dropping data, and not joining after split-brain

Hi All,

We have a cluster of 2 ElasticSearch nodes operating on production and
handling data for various public transport information systems. It's
working for about a year now, but from time to time one of the nodes turns
yellow and is missing data, which causes serious issues for the clients and
passangers (the requests that go to the yellow node are getting no results
in response). In such case we restart the nodes and reindex which fixes the
problem. But still, this is happening by average once per 2 weeks, so I
wanted to ask for some help.

There also seems to be a problem with the logger, as for some days the log
files are missing, which makes it more difficult to diagnose issues.

What we already found is that the nodes sometimes seem to be so busy, that
pings between them are getting timeouts. Also from time to time one of the
nodes removes the other from it's view (3 ping timeouts in a row), but
usually they join shortly. However sometimes the nodes can't join, one of
them has state yellow, and is

Sample logs are following:

node 0, 08/04/2013 - https://gist.github.com/anonymous/5b1266482680a1f6d469
node 0, 09/04/2013 - missing
node 1, 08/04/2013 - https://gist.github.com/anonymous/5e45f693428f646af2d7
node 1, 09/04/2013 - https://gist.github.com/anonymous/1587b09e2cd94b369004

We'd be thankful for any help,

Thanks
Marcin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

With only 2 nodes, you have no way to defend against a split brain. Using
the minimum_master_nodes property, and a minimum of 3-nodes (odd-numbered
required) you can defend against this case:

http://www.elasticsearch.org/guide/reference/modules/discovery/zen/

though I still personally believe there is a problem with Zen and it's
master election, see Issue #2117 and #2488, you're at least likely to be
covered better than you are using minimum_master_nodes.

Paul

On 13 April 2013 01:20, Marcin marcinsulski@gmail.com wrote:

Hi All,

We have a cluster of 2 ElasticSearch nodes operating on production and
handling data for various public transport information systems. It's
working for about a year now, but from time to time one of the nodes turns
yellow and is missing data, which causes serious issues for the clients and
passangers (the requests that go to the yellow node are getting no results
in response). In such case we restart the nodes and reindex which fixes the
problem. But still, this is happening by average once per 2 weeks, so I
wanted to ask for some help.

There also seems to be a problem with the logger, as for some days the log
files are missing, which makes it more difficult to diagnose issues.

What we already found is that the nodes sometimes seem to be so busy, that
pings between them are getting timeouts. Also from time to time one of the
nodes removes the other from it's view (3 ping timeouts in a row), but
usually they join shortly. However sometimes the nodes can't join, one of
them has state yellow, and is

Sample logs are following:

node 0, 08/04/2013 -
https://gist.github.com/anonymous/5b1266482680a1f6d469
node 0, 09/04/2013 - missing
node 1, 08/04/2013 -
https://gist.github.com/anonymous/5e45f693428f646af2d7
node 1, 09/04/2013 -
https://gist.github.com/anonymous/1587b09e2cd94b369004

We'd be thankful for any help,

Thanks
Marcin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Are you monitoring CPU and heap on the nodes? What happens to CPU and java
heap just before nodes turn yellow?

On Monday, April 15, 2013 11:35:10 PM UTC-4, tallpsmith wrote:

With only 2 nodes, you have no way to defend against a split brain. Using
the minimum_master_nodes property, and a minimum of 3-nodes (odd-numbered
required) you can defend against this case:

http://www.elasticsearch.org/guide/reference/modules/discovery/zen/

though I still personally believe there is a problem with Zen and it's
master election, see Issue #2117 and #2488, you're at least likely to be
covered better than you are using minimum_master_nodes.

Paul

On 13 April 2013 01:20, Marcin <marcin...@gmail.com <javascript:>> wrote:

Hi All,

We have a cluster of 2 ElasticSearch nodes operating on production and
handling data for various public transport information systems. It's
working for about a year now, but from time to time one of the nodes turns
yellow and is missing data, which causes serious issues for the clients and
passangers (the requests that go to the yellow node are getting no results
in response). In such case we restart the nodes and reindex which fixes the
problem. But still, this is happening by average once per 2 weeks, so I
wanted to ask for some help.

There also seems to be a problem with the logger, as for some days the
log files are missing, which makes it more difficult to diagnose issues.

What we already found is that the nodes sometimes seem to be so busy,
that pings between them are getting timeouts. Also from time to time one of
the nodes removes the other from it's view (3 ping timeouts in a row), but
usually they join shortly. However sometimes the nodes can't join, one of
them has state yellow, and is

Sample logs are following:

node 0, 08/04/2013 -
https://gist.github.com/anonymous/5b1266482680a1f6d469
node 0, 09/04/2013 - missing
node 1, 08/04/2013 -
https://gist.github.com/anonymous/5e45f693428f646af2d7
node 1, 09/04/2013 -
https://gist.github.com/anonymous/1587b09e2cd94b369004

We'd be thankful for any help,

Thanks
Marcin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.