Finding the reason behind random node shutdowns

Donald_Piret_2 · August 29, 2013, 7:35am

Hello,

We're currently setting up a cluster of 3 ES nodes running on EC2 with auto
discovery.
Everything seems to be working fine except for regular and random node
restarts, which makes the master move around quite a bit and makes the
nodes sometimes unavailable for a few seconds (quite annoying as we're
using Tire which doesn't natively support falling back to alternate nodes
when queries fail).

The log files don't indicate anything specific and look like this:

[2013-08-29 06:54:41,310][INFO ][node ]
[elasticsearch3] version[0.90.3], pid[7919],
build[5c38d60/2013-08-06T13:18:31Z]
[2013-08-29 06:54:41,310][INFO ][node ]
[elasticsearch3] initializing ...
[2013-08-29 06:54:41,401][INFO ][plugins ]
[elasticsearch3] loaded [analysis-kuromoji, cloud-aws], sites [bigdesk,
head, paramedic]
[2013-08-29 06:54:45,562][INFO ][node ]
[elasticsearch3] initialized
[2013-08-29 06:54:45,563][INFO ][node ]
[elasticsearch3] starting ...
[2013-08-29 06:54:45,790][INFO ][transport ]
[elasticsearch3] bound_address {inet[/10.158.2.117:9300]}, publish_address
{inet[/10.158.2.117:9300]}
[2013-08-29 06:54:49,826][INFO ][cluster.service ]
[elasticsearch3] new_master
[elasticsearch3][0-r3DpIaS4aehPk9BQcrAQ][inet[/10.158.2.117:9300]], reason:
zen-disco-join (elected_as_master)
[2013-08-29 06:54:49,835][INFO ][discovery ]
[elasticsearch3] elasticsearch/0-r3DpIaS4aehPk9BQcrAQ
[2013-08-29 06:54:49,861][INFO ][http ]
[elasticsearch3] bound_address {inet[/10.158.2.117:9200]}, publish_address
{inet[/10.158.2.117:9200]}
[2013-08-29 06:54:49,861][INFO ][node ]
[elasticsearch3] started
[2013-08-29 06:54:51,089][INFO ][gateway ]
[elasticsearch3] recovered [8] indices into cluster_state
[2013-08-29 06:54:55,446][INFO ][cluster.service ]
[elasticsearch3] added
{[elasticsearch2][6e4q3UuASI-m08ZvrUoGFw][inet[/10.215.41.203:9300]],},
reason: zen-disco-receive(join from
node[[elasticsearch2][6e4q3UuASI-m08ZvrUoGFw][inet[/10.215.41.203:9300]]])
[2013-08-29 06:55:06,613][INFO ][cluster.service ]
[elasticsearch3] added
{[elasticsearch1][ZwxSqqehRXutdqtWz-THKw][inet[/10.31.146.12:9300]],},
reason: zen-disco-receive(join from
node[[elasticsearch1][ZwxSqqehRXutdqtWz-THKw][inet[/10.31.146.12:9300]]])
[2013-08-29 07:10:57,523][INFO ][node ]
[elasticsearch3] stopping ...
[2013-08-29 07:10:57,611][INFO ][node ]
[elasticsearch3] stopped
[2013-08-29 07:10:57,615][INFO ][node ]
[elasticsearch3] closing ...
[2013-08-29 07:10:57,623][INFO ][node ]
[elasticsearch3] closed
[2013-08-29 07:10:59,383][INFO ][node ]
[elasticsearch3] version[0.90.3], pid[10127],
build[5c38d60/2013-08-06T13:18:31Z]
[2013-08-29 07:10:59,384][INFO ][node ]
[elasticsearch3] initializing ...
[2013-08-29 07:10:59,418][INFO ][plugins ]
[elasticsearch3] loaded [analysis-kuromoji, cloud-aws], sites [bigdesk,
head, paramedic]
[2013-08-29 07:11:03,110][INFO ][node ]
[elasticsearch3] initialized
[2013-08-29 07:11:03,111][INFO ][node ]
[elasticsearch3] starting ...

As you can see at 7:10:57 it's just stopping the node out of the blue
without any apparent reason.

Why would this be the case or how could I get more details about the reason
behind the shutdown and how to prevent it?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · August 29, 2013, 9:36am

I have seen these also in some tests but I never figured it out though. Can
you set you logging level to TRACE and maybe gimme a log?

simon

On Thursday, August 29, 2013 9:35:12 AM UTC+2, Donald Piret wrote:

Hello,

We're currently setting up a cluster of 3 ES nodes running on EC2 with
auto discovery.
Everything seems to be working fine except for regular and random node
restarts, which makes the master move around quite a bit and makes the
nodes sometimes unavailable for a few seconds (quite annoying as we're
using Tire which doesn't natively support falling back to alternate nodes
when queries fail).

The log files don't indicate anything specific and look like this:

[2013-08-29 06:54:41,310][INFO ][node ]
[elasticsearch3] version[0.90.3], pid[7919],
build[5c38d60/2013-08-06T13:18:31Z]
[2013-08-29 06:54:41,310][INFO ][node ]
[elasticsearch3] initializing ...
[2013-08-29 06:54:41,401][INFO ][plugins ]
[elasticsearch3] loaded [analysis-kuromoji, cloud-aws], sites [bigdesk,
head, paramedic]
[2013-08-29 06:54:45,562][INFO ][node ]
[elasticsearch3] initialized
[2013-08-29 06:54:45,563][INFO ][node ]
[elasticsearch3] starting ...
[2013-08-29 06:54:45,790][INFO ][transport ]
[elasticsearch3] bound_address {inet[/10.158.2.117:9300]}, publish_address
{inet[/10.158.2.117:9300]}
[2013-08-29 06:54:49,826][INFO ][cluster.service ]
[elasticsearch3] new_master
[elasticsearch3][0-r3DpIaS4aehPk9BQcrAQ][inet[/10.158.2.117:9300]], reason:
zen-disco-join (elected_as_master)
[2013-08-29 06:54:49,835][INFO ][discovery ]
[elasticsearch3] elasticsearch/0-r3DpIaS4aehPk9BQcrAQ
[2013-08-29 06:54:49,861][INFO ][http ]
[elasticsearch3] bound_address {inet[/10.158.2.117:9200]}, publish_address
{inet[/10.158.2.117:9200]}
[2013-08-29 06:54:49,861][INFO ][node ]
[elasticsearch3] started
[2013-08-29 06:54:51,089][INFO ][gateway ]
[elasticsearch3] recovered [8] indices into cluster_state
[2013-08-29 06:54:55,446][INFO ][cluster.service ]
[elasticsearch3] added
{[elasticsearch2][6e4q3UuASI-m08ZvrUoGFw][inet[/10.215.41.203:9300]],},
reason: zen-disco-receive(join from
node[[elasticsearch2][6e4q3UuASI-m08ZvrUoGFw][inet[/10.215.41.203:9300]]])
[2013-08-29 06:55:06,613][INFO ][cluster.service ]
[elasticsearch3] added
{[elasticsearch1][ZwxSqqehRXutdqtWz-THKw][inet[/10.31.146.12:9300]],},
reason: zen-disco-receive(join from
node[[elasticsearch1][ZwxSqqehRXutdqtWz-THKw][inet[/10.31.146.12:9300]]])
[2013-08-29 07:10:57,523][INFO ][node ]
[elasticsearch3] stopping ...
[2013-08-29 07:10:57,611][INFO ][node ]
[elasticsearch3] stopped
[2013-08-29 07:10:57,615][INFO ][node ]
[elasticsearch3] closing ...
[2013-08-29 07:10:57,623][INFO ][node ]
[elasticsearch3] closed
[2013-08-29 07:10:59,383][INFO ][node ]
[elasticsearch3] version[0.90.3], pid[10127],
build[5c38d60/2013-08-06T13:18:31Z]
[2013-08-29 07:10:59,384][INFO ][node ]
[elasticsearch3] initializing ...
[2013-08-29 07:10:59,418][INFO ][plugins ]
[elasticsearch3] loaded [analysis-kuromoji, cloud-aws], sites [bigdesk,
head, paramedic]
[2013-08-29 07:11:03,110][INFO ][node ]
[elasticsearch3] initialized
[2013-08-29 07:11:03,111][INFO ][node ]
[elasticsearch3] starting ...

As you can see at 7:10:57 it's just stopping the node out of the blue
without any apparent reason.

Why would this be the case or how could I get more details about the
reason behind the shutdown and how to prevent it?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Donald_Piret_2 · August 30, 2013, 7:15am

Hey Simon,

Setting the rootLogger to trace seems to generate massive amounts of
output, not too sure where i'd have to start looking to even find these
errors anymore.
Is there any specific logger action I could set to trace that would provide
more targeted output or will I just have to plow through it?

On Thursday, August 29, 2013 5:36:52 PM UTC+8, simonw wrote:

I have seen these also in some tests but I never figured it out though.
Can you set you logging level to TRACE and maybe gimme a log?

simon

On Thursday, August 29, 2013 9:35:12 AM UTC+2, Donald Piret wrote:

Hello,

We're currently setting up a cluster of 3 ES nodes running on EC2 with
auto discovery.
Everything seems to be working fine except for regular and random node
restarts, which makes the master move around quite a bit and makes the
nodes sometimes unavailable for a few seconds (quite annoying as we're
using Tire which doesn't natively support falling back to alternate nodes
when queries fail).

The log files don't indicate anything specific and look like this:

[2013-08-29 06:54:41,310][INFO ][node ]
[elasticsearch3] version[0.90.3], pid[7919],
build[5c38d60/2013-08-06T13:18:31Z]
[2013-08-29 06:54:41,310][INFO ][node ]
[elasticsearch3] initializing ...
[2013-08-29 06:54:41,401][INFO ][plugins ]
[elasticsearch3] loaded [analysis-kuromoji, cloud-aws], sites [bigdesk,
head, paramedic]
[2013-08-29 06:54:45,562][INFO ][node ]
[elasticsearch3] initialized
[2013-08-29 06:54:45,563][INFO ][node ]
[elasticsearch3] starting ...
[2013-08-29 06:54:45,790][INFO ][transport ]
[elasticsearch3] bound_address {inet[/10.158.2.117:9300]}, publish_address
{inet[/10.158.2.117:9300]}
[2013-08-29 06:54:49,826][INFO ][cluster.service ]
[elasticsearch3] new_master
[elasticsearch3][0-r3DpIaS4aehPk9BQcrAQ][inet[/10.158.2.117:9300]], reason:
zen-disco-join (elected_as_master)
[2013-08-29 06:54:49,835][INFO ][discovery ]
[elasticsearch3] elasticsearch/0-r3DpIaS4aehPk9BQcrAQ
[2013-08-29 06:54:49,861][INFO ][http ]
[elasticsearch3] bound_address {inet[/10.158.2.117:9200]}, publish_address
{inet[/10.158.2.117:9200]}
[2013-08-29 06:54:49,861][INFO ][node ]
[elasticsearch3] started
[2013-08-29 06:54:51,089][INFO ][gateway ]
[elasticsearch3] recovered [8] indices into cluster_state
[2013-08-29 06:54:55,446][INFO ][cluster.service ]
[elasticsearch3] added
{[elasticsearch2][6e4q3UuASI-m08ZvrUoGFw][inet[/10.215.41.203:9300]],},
reason: zen-disco-receive(join from
node[[elasticsearch2][6e4q3UuASI-m08ZvrUoGFw][inet[/10.215.41.203:9300]]])
[2013-08-29 06:55:06,613][INFO ][cluster.service ]
[elasticsearch3] added
{[elasticsearch1][ZwxSqqehRXutdqtWz-THKw][inet[/10.31.146.12:9300]],},
reason: zen-disco-receive(join from
node[[elasticsearch1][ZwxSqqehRXutdqtWz-THKw][inet[/10.31.146.12:9300]]])
[2013-08-29 07:10:57,523][INFO ][node ]
[elasticsearch3] stopping ...
[2013-08-29 07:10:57,611][INFO ][node ]
[elasticsearch3] stopped
[2013-08-29 07:10:57,615][INFO ][node ]
[elasticsearch3] closing ...
[2013-08-29 07:10:57,623][INFO ][node ]
[elasticsearch3] closed
[2013-08-29 07:10:59,383][INFO ][node ]
[elasticsearch3] version[0.90.3], pid[10127],
build[5c38d60/2013-08-06T13:18:31Z]
[2013-08-29 07:10:59,384][INFO ][node ]
[elasticsearch3] initializing ...
[2013-08-29 07:10:59,418][INFO ][plugins ]
[elasticsearch3] loaded [analysis-kuromoji, cloud-aws], sites [bigdesk,
head, paramedic]
[2013-08-29 07:11:03,110][INFO ][node ]
[elasticsearch3] initialized
[2013-08-29 07:11:03,111][INFO ][node ]
[elasticsearch3] starting ...

As you can see at 7:10:57 it's just stopping the node out of the blue
without any apparent reason.

Why would this be the case or how could I get more details about the
reason behind the shutdown and how to prevent it?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

sazary · November 23, 2015, 6:22pm

Did you find the reason? or a solution?

Topic		Replies	Views
Elasticsearch randomly restarts Elasticsearch	5	469	July 6, 2017
Nodes randomly restarting Elasticsearch	3	1457	July 6, 2017
Nodes restarting automatically Elasticsearch	23	1478	July 6, 2017
Elasticsearch nodes automatically disconnected Elasticsearch	2	684	August 10, 2021
Master node hangs when multiple data nodes are shutdown at the same time Elasticsearch	6	954	July 6, 2017

Finding the reason behind random node shutdowns

Related topics