ES 1.4.2 random node disconnect

Revan007 · January 9, 2015, 5:55am

Hey,

I am having trouble for some while. I am getting random node disconnects
and I cannot explain why.
There is no increase in traffic ( search or index ) when this is happening
, it feels so random to me .
I first thought it could be the aws cloud plugin so I removed it and used
unicast and pointed directly to my nodes IPs but that didn't seem to be the
problem .
I changed the type of instances, now m3.2xlarge, added more instances, made
so much modifications in ES yml config and still nothing .
Changed java oracle from 1.7 to 1.8 , changed CMS collector to G1GC and
still nothing .

I am out of ideas ... how can I get more info on what is going on ?

Here are the logs I can see from master node and the data node
http://pastebin.com/GhKfRkaa

Current config:

6 m3.x2large, 1 master, 5 data nodes.
414 indices, index/day
7372 shards. 9 shards, 1 replica per index
208 million documents, 430 GB
15 gb heap size allocated per node
ES 1.4.2

Current yml config here :
http://pastebin.com/Nmdr7F6J

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85cc2abe-da8e-4170-8e7d-a4e01f4a22c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · January 10, 2015, 12:49pm

If you see

cluster:monitor/nodes/stats[n]] request_id [82300775] timed out after
[15000ms]

in the logs, you have a monitor tool running that can not complete requests
because it takes longer than 15 seconds to traverse all the data folders on
all the nodes.

There are a number of methods to reduce disk traversal time in the data
folders:

switch off monitoring (not really helpful) or reduce monitor interval
(maybe helpful, maybe not)
increase stats request timeout (if monitor tools allow this but this does
not solve the cause of the problem)
monitor only an index subset of your cluster (monitor tools usually do
not have this option)
reduce number of segments per node -> either by optimizing indices or
adding nodes
wait for a fix in a future ES release

Have you counted the total number of segments? If the number is high, did
you run _optimize with max_num_segments on your indices to reduce the
number of segments?

Jörg

On Fri, Jan 9, 2015 at 6:55 AM, Revan007 dragosr@pionix.ro wrote:

Hey,

I am having trouble for some while. I am getting random node disconnects
and I cannot explain why.
There is no increase in traffic ( search or index ) when this is happening
, it feels so random to me .
I first thought it could be the aws cloud plugin so I removed it and used
unicast and pointed directly to my nodes IPs but that didn't seem to be the
problem .
I changed the type of instances, now m3.2xlarge, added more instances,
made so much modifications in ES yml config and still nothing .
Changed java oracle from 1.7 to 1.8 , changed CMS collector to G1GC and
still nothing .

I am out of ideas ... how can I get more info on what is going on ?

Here are the logs I can see from master node and the data node
http://pastebin.com/GhKfRkaa

Current config:

6 m3.x2large, 1 master, 5 data nodes.
414 indices, index/day
7372 shards. 9 shards, 1 replica per index
208 million documents, 430 GB
15 gb heap size allocated per node
ES 1.4.2

Current yml config here :
http://pastebin.com/Nmdr7F6J

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85cc2abe-da8e-4170-8e7d-a4e01f4a22c3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/85cc2abe-da8e-4170-8e7d-a4e01f4a22c3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHv0wxNXq_nJrj5ByxrpZmwbdiKmMUbu4YYfjuGM5XkAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Revan007 · January 10, 2015, 12:56pm

Hey, thank you for answering, I am using Marvel latest version.

Here is more info about the problem :

github.com/elastic/elasticsearch

ES 1.4.2 random node disconnect

opened 03:34AM - 09 Jan 15 UTC

closed 07:46PM - 03 Dec 15 UTC

dragosrosculete

Hey, I am having trouble for some while. I am getting random node disconnects a…nd I cannot explain why. There is no increase in traffic ( search or index ) when this is happening , it feels so random to me . I first thought it could be the aws cloud plugin so I removed it and used unicast and pointed directly to my nodes IPs but that didn't seem to be the problem . I changed the type of instances, now m3.2xlarge, added more instances, made so much modifications in ES yml config and still nothing . Changed java oracle from 1.7 to 1.8 , changed CMS collector to G1GC and still nothing . I am out of ideas ... how can I get more info on what is going on ? Here are the logs I can see from master node and the data node http://pastebin.com/GhKfRkaa

On Saturday, January 10, 2015 at 2:50:02 PM UTC+2, Jörg Prante wrote:

If you see

cluster:monitor/nodes/stats[n]] request_id [82300775] timed out after
[15000ms]

in the logs, you have a monitor tool running that can not complete
requests because it takes longer than 15 seconds to traverse all the data
folders on all the nodes.

There are a number of methods to reduce disk traversal time in the data
folders:

switch off monitoring (not really helpful) or reduce monitor interval
(maybe helpful, maybe not)

increase stats request timeout (if monitor tools allow this but this
does not solve the cause of the problem)

monitor only an index subset of your cluster (monitor tools usually do
not have this option)

reduce number of segments per node -> either by optimizing indices or
adding nodes

wait for a fix in a future ES release

Have you counted the total number of segments? If the number is high, did
you run _optimize with max_num_segments on your indices to reduce the
number of segments?

Jörg

On Fri, Jan 9, 2015 at 6:55 AM, Revan007 <dra...@pionix.ro <javascript:>>
wrote:

Hey,

I am having trouble for some while. I am getting random node disconnects
and I cannot explain why.
There is no increase in traffic ( search or index ) when this is
happening , it feels so random to me .
I first thought it could be the aws cloud plugin so I removed it and used
unicast and pointed directly to my nodes IPs but that didn't seem to be the
problem .
I changed the type of instances, now m3.2xlarge, added more instances,
made so much modifications in ES yml config and still nothing .
Changed java oracle from 1.7 to 1.8 , changed CMS collector to G1GC and
still nothing .

I am out of ideas ... how can I get more info on what is going on ?

Here are the logs I can see from master node and the data node
http://pastebin.com/GhKfRkaa

Current config:

6 m3.x2large, 1 master, 5 data nodes.
414 indices, index/day
7372 shards. 9 shards, 1 replica per index
208 million documents, 430 GB
15 gb heap size allocated per node
ES 1.4.2

Current yml config here :
http://pastebin.com/Nmdr7F6J

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85cc2abe-da8e-4170-8e7d-a4e01f4a22c3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/85cc2abe-da8e-4170-8e7d-a4e01f4a22c3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fca12a91-07f6-4152-a4e4-97098e68fd0e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Revan007 · January 10, 2015, 1:18pm

The thing is I don't think is the monitor plugin. When this happens, my
node gets disconnected and the cluster goes into yellow state till it
recovers . I am using curator optimize , it is set to 2 segments for
indices older than 2 days .

On Saturday, January 10, 2015 at 2:56:52 PM UTC+2, Revan007 wrote:

Hey, thank you for answering, I am using Marvel latest version.

Here is more info about the problem :

ES 1.4.2 random node disconnect · Issue #9212 · elastic/elasticsearch · GitHub

On Saturday, January 10, 2015 at 2:50:02 PM UTC+2, Jörg Prante wrote:

If you see

cluster:monitor/nodes/stats[n]] request_id [82300775] timed out after
[15000ms]

in the logs, you have a monitor tool running that can not complete
requests because it takes longer than 15 seconds to traverse all the data
folders on all the nodes.

There are a number of methods to reduce disk traversal time in the data
folders:

switch off monitoring (not really helpful) or reduce monitor interval
(maybe helpful, maybe not)

increase stats request timeout (if monitor tools allow this but this
does not solve the cause of the problem)

monitor only an index subset of your cluster (monitor tools usually do
not have this option)

reduce number of segments per node -> either by optimizing indices or
adding nodes

wait for a fix in a future ES release

Have you counted the total number of segments? If the number is high, did
you run _optimize with max_num_segments on your indices to reduce the
number of segments?

Jörg

On Fri, Jan 9, 2015 at 6:55 AM, Revan007 dra...@pionix.ro wrote:

Hey,

I am having trouble for some while. I am getting random node disconnects
and I cannot explain why.
There is no increase in traffic ( search or index ) when this is
happening , it feels so random to me .
I first thought it could be the aws cloud plugin so I removed it and
used unicast and pointed directly to my nodes IPs but that didn't seem to
be the problem .
I changed the type of instances, now m3.2xlarge, added more instances,
made so much modifications in ES yml config and still nothing .
Changed java oracle from 1.7 to 1.8 , changed CMS collector to G1GC and
still nothing .

I am out of ideas ... how can I get more info on what is going on ?

Here are the logs I can see from master node and the data node
http://pastebin.com/GhKfRkaa

Current config:

6 m3.x2large, 1 master, 5 data nodes.
414 indices, index/day
7372 shards. 9 shards, 1 replica per index
208 million documents, 430 GB
15 gb heap size allocated per node
ES 1.4.2

Current yml config here :
http://pastebin.com/Nmdr7F6J

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85cc2abe-da8e-4170-8e7d-a4e01f4a22c3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/85cc2abe-da8e-4170-8e7d-a4e01f4a22c3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4cc37e2e-4bbc-483d-bbbe-6cd0138d6689%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ES 1.4.2 random node disconnect Elasticsearch	1	361	July 6, 2017
Nodes disconnect without apparent reason Elasticsearch	4	511	July 6, 2017
Nodes keep disconnecting from cluster at random Elasticsearch	8	2796	January 4, 2018
Help with Heap at 100% and random node disconnect Elasticsearch	2	496	July 5, 2017
Sporadic node disconnected issues Elasticsearch	3	589	July 5, 2017

ES 1.4.2 random node disconnect

Related topics