Joining node to cluster without restarting entire machine?

Tony_Su · February 4, 2014, 5:22pm

Unless I'm missing something in the docs or these forums,

I've surprisingly found that if a node fails to join the cluster, it's not
sufficient to simply restart ES on the machine. I would have thought that
restarting ES thereby re-reading its config files should be sufficient to
announce its intention to join the cluster.

But, I haven't found that to be the case, every time I've had to reboot the
entire machine to join the cluster.

Is there a config I'm missing?

Thx,
Tony

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/02c4b578-f430-44ba-a98c-7337b684125d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

depahelix_2 · February 4, 2014, 5:26pm

Here is something:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bad9c4e0-7512-46b4-8762-191796ac412d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tony_Su · February 4, 2014, 5:48pm

Hi,
I understand you probably meant to post this to one of my other threads
https://groups.google.com/forum/#!topic/elasticsearch/dC48AAeL544

Interesting late development.
Too bad it sounds like what IBM is developing will be available only on IBM
servers, but it's understandable.

Unless you want to pay for an IBM, I guess it'll be a wait.

Tony

On Tuesday, February 4, 2014 9:26:38 AM UTC-8, depahelix wrote:

Here is something:
http://blogs.nvidia.com/blog/2013/09/22/gpu-coming-to-java/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/099a3c0d-d926-403f-a2c8-545661b46e26%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

warkolm · February 4, 2014, 10:10pm

If you give the service a restart, it's a stop and then a start (obviously).
This will/should reread the config and attempt to rejoin the cluster in the
config.

Can you try an explicit stop, then sleep for 5, then start? It could be the
process isn't properly closing when requested.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 February 2014 04:22, Tony Su tonysu999@gmail.com wrote:

Unless I'm missing something in the docs or these forums,

I've surprisingly found that if a node fails to join the cluster, it's not
sufficient to simply restart ES on the machine. I would have thought that
restarting ES thereby re-reading its config files should be sufficient to
announce its intention to join the cluster.

But, I haven't found that to be the case, every time I've had to reboot
the entire machine to join the cluster.

Is there a config I'm missing?

Thx,
Tony

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/02c4b578-f430-44ba-a98c-7337b684125d%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YB4%3Dte4VhWDeqkTtnEeDvsDO7_Hc6gWAtz74o76jKzSA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tony_Su · February 5, 2014, 12:22am

Hi Mark,
I've done all that to no effect.

FYI if it makes a diff,
I'm running on a distro that uses systemd, so in theory when the Service is
started, it's supposed to create a cgroup in which the new process is run,
and if there are any processes that are spawned (including but not limited
to new ES processes), they're all supposed to be managed by that cgroup.
This generally means that compared to SystemV when the cgroup is shutdown,
it shuts down all child processes reliably, there are no orphaned processes
that continue to run.

So, when I stop the ES service, it really should be shutdown.
But, when I start up again I've waited over 5 minutes on a small but active
cluster accepting new data and the node never joins.
But, after rebooting the orphaned node, and starting the ES service it
rarely takes more than about 15 seconds to join (according to ES-head).

Tony

On Tuesday, February 4, 2014 2:10:14 PM UTC-8, Mark Walkom wrote:

If you give the service a restart, it's a stop and then a start
(obviously).
This will/should reread the config and attempt to rejoin the cluster in
the config.

Can you try an explicit stop, then sleep for 5, then start? It could be
the process isn't properly closing when requested.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 5 February 2014 04:22, Tony Su <tony...@gmail.com <javascript:>> wrote:

Unless I'm missing something in the docs or these forums,

I've surprisingly found that if a node fails to join the cluster, it's
not sufficient to simply restart ES on the machine. I would have thought
that restarting ES thereby re-reading its config files should be sufficient
to announce its intention to join the cluster.

But, I haven't found that to be the case, every time I've had to reboot
the entire machine to join the cluster.

Is there a config I'm missing?

Thx,
Tony

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/02c4b578-f430-44ba-a98c-7337b684125d%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb4d9dd4-eb79-4135-b615-2b1101b4d5f1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tony_Su · February 5, 2014, 4:59pm

After more testing,
It seems that restarting the service should be sufficient, you just need to
be patient and wait. If the cluster (master specifically?) isn't too busy
that joining times out (which is configurable in elasticsearch.yml),
eventually it'll join.

In light of this, I'm going to generally modify the timeout for a much,
much longer than default value unless someone can describe a downside.

And, I am curious whether the new candidate node needs to connect
specifically to a Master instead of just any node in the cluster... The
docs and descriptions I've read so far only describe contacting the cluster
generally.

Am also curious (short of packet sniffing) if in the act of joining the
candidate node repeatedly sends requests to join at what interval (is it
close to a broadcast storm or very pedestrian or maybe only once?)

Tony

On Tuesday, February 4, 2014 4:22:58 PM UTC-8, Tony Su wrote:

Hi Mark,
I've done all that to no effect.

FYI if it makes a diff,
I'm running on a distro that uses systemd, so in theory when the Service
is started, it's supposed to create a cgroup in which the new process is
run, and if there are any processes that are spawned (including but not
limited to new ES processes), they're all supposed to be managed by that
cgroup. This generally means that compared to SystemV when the cgroup is
shutdown, it shuts down all child processes reliably, there are no orphaned
processes that continue to run.

So, when I stop the ES service, it really should be shutdown.
But, when I start up again I've waited over 5 minutes on a small but
active cluster accepting new data and the node never joins.
But, after rebooting the orphaned node, and starting the ES service it
rarely takes more than about 15 seconds to join (according to ES-head).

Tony

On Tuesday, February 4, 2014 2:10:14 PM UTC-8, Mark Walkom wrote:

If you give the service a restart, it's a stop and then a start
(obviously).
This will/should reread the config and attempt to rejoin the cluster in
the config.

Can you try an explicit stop, then sleep for 5, then start? It could be
the process isn't properly closing when requested.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 February 2014 04:22, Tony Su tony...@gmail.com wrote:

Unless I'm missing something in the docs or these forums,

I've surprisingly found that if a node fails to join the cluster, it's
not sufficient to simply restart ES on the machine. I would have thought
that restarting ES thereby re-reading its config files should be sufficient
to announce its intention to join the cluster.

But, I haven't found that to be the case, every time I've had to reboot
the entire machine to join the cluster.

Is there a config I'm missing?

Thx,
Tony

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/02c4b578-f430-44ba-a98c-7337b684125d%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b690bde-71ff-415f-994e-2031662e522c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
What is the proper way to restart node without clients noticing? Elasticsearch	5	385	July 6, 2017
Join/un-join a Node from a cluster at runtime without restart Elasticsearch	1	280	July 6, 2017
ES 7.3 - restarting data node doesn't rejoin cluster Elasticsearch	6	2037	December 21, 2020
How to confirm that a node has successfully joined the cluster Elasticsearch	3	518	July 6, 2017
Node will not shut down Elasticsearch	5	410	July 6, 2017

Joining node to cluster without restarting entire machine?

Related topics