Sporadic node disconnected issues

Darshat_Shah · February 25, 2015, 5:45am

Hi
I have an ES cluster with 27 nodes (3 master, 24 data). At times I see a
burst of nodes leaving and rejoining within couple of minutes. Each node
has 16GB allocated for the JVM heap and are not close to touching those
limits. There are no memory issues, and there is no search/index operations
going on when this occurred. But there are quite a few nodedisconnected
messages that suddenly appear on the master. It doesn’t seem to happen all
the time but in bursts.

During this time, on the master, I see NodeDisconnectedException for a
node. On that node, I see messages that say “master left (reason =
transport disconnected)”. I don't think its split-brain though with the
number of messages in the logs its hard to figure out. Also min number of
master setting is set to 2. The outcome is that it causes a whole lot of
shards to shift around.

I'd like to involve our network specialists to troubleshoot
connectivity but not sure what to ask them to look for. In what scenarios
does ElasticSearch reports node disconnected? Should they be looking at TCP
connectivity, run some ping tests, etc.?

Also are there timeout values that can be configured so we can reduce false
positives for node disconnected events?

Thanks

Darshat

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7ad5de3-0e9b-4496-9c96-5162b784bac1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 26, 2015, 12:57am

You may find it's GC related, so check your logs on the nodes.
Take a look at

for some timeout options around discovery.

On 25 February 2015 at 16:45, Darshat Shah darshat@gmail.com wrote:

Hi
I have an ES cluster with 27 nodes (3 master, 24 data). At times I see a
burst of nodes leaving and rejoining within couple of minutes. Each node
has 16GB allocated for the JVM heap and are not close to touching those
limits. There are no memory issues, and there is no search/index
operations going on when this occurred. But there are quite a few
nodedisconnected messages that suddenly appear on the master. It doesn’t
seem to happen all the time but in bursts.

During this time, on the master, I see NodeDisconnectedException for a
node. On that node, I see messages that say “master left (reason =
transport disconnected)”. I don't think its split-brain though with the
number of messages in the logs its hard to figure out. Also min number of
master setting is set to 2. The outcome is that it causes a whole lot of
shards to shift around.

I'd like to involve our network specialists to troubleshoot
connectivity but not sure what to ask them to look for. In what scenarios
does Elasticsearch reports node disconnected? Should they be looking at TCP
connectivity, run some ping tests, etc.?

Also are there timeout values that can be configured so we can reduce
false positives for node disconnected events?

Thanks

Darshat

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e7ad5de3-0e9b-4496-9c96-5162b784bac1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e7ad5de3-0e9b-4496-9c96-5162b784bac1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8GY035smZmFognwxzaaOzWkSgR0aw%3DqAoudC2a0OQ%2BUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Tejas_Vora · April 22, 2016, 9:41pm

Hi Darshat,

Thanks for the response. We will be trying your suggested diagnosis. Time being, we have removed all moving parts:

Removed NGNIX with OpenSSL (current OpenSLL has a bug)
Removed compression on indices
Connecting directly to client nodes using 2 URLs.

Our DevOps team is doing some more investigation on this and as a part of that, they will collect the logs/data suggested by you. Once, we have some concrete data - I will reply back with more details.

Topic		Replies	Views
ES nodes disconnects intermittently from the cluster Elasticsearch	1	630	February 8, 2018
Nodes disconnected randomly Elasticsearch painless	1	311	September 19, 2022
Elasticsearch nodes continually disconneting/reconnecting. Resulting in very high number of unassigned shards Elasticsearch	18	2657	September 3, 2020
ES datanode can't join the cluster since disconnected from master Elasticsearch	7	1965	December 28, 2020
Node not connected Elasticsearch	4	11894	July 6, 2017

Sporadic node disconnected issues

Related topics