ES fails to restart after reboot

Russell_Fulton · July 15, 2023, 9:35pm

version 7.17.1 running on ubuntu -- started from systemctl

When ever the server is rebooted ES fails to restart properly. Subsequent manual restart works just fine.

[2023-07-16T01:37:33,109][INFO ][o.e.p.PluginsService     ] [secesprd06] no plugins loaded
[2023-07-16T01:37:33,223][INFO ][o.e.e.NodeEnvironment    ] [secesprd06] using [1] data paths, mounts [[/data (/dev/mapper/vgdata-lvdata)]], net usable_space [710.8gb], net total_space [1.4tb], types [xfs]
[2023-07-16T01:37:33,224][INFO ][o.e.e.NodeEnvironment    ] [secesprd06] heap size [30gb], compressed ordinary object pointers [true]
[2023-07-16T01:37:34,192][INFO ][o.e.n.Node               ] [secesprd06] node name [secesprd06], node ID [XcpMS9QmT06yxJ0WX9avlQ], cluster name [security], roles [data_cold]
[2023-07-16T01:37:44,551][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [secesprd06] [controller/2881] [Main.cc@122] controller (64 bit): Version 7.17.1 (Build 6d8a28b39bf223) Copyright (c) 2022 Elasticsearch BV
[2023-07-16T01:37:45,240][INFO ][o.e.x.s.a.Realms         ] [secesprd06] license mode is [trial], currently licensed security realms are [reserved/reserved,file/default_file,native/default_native]
[2023-07-16T01:37:45,253][INFO ][o.e.x.s.a.s.FileRolesStore] [secesprd06] parsed [0] roles from file [/etc/elasticsearch/roles.yml]
[2023-07-16T01:37:47,513][INFO ][o.e.i.g.ConfigDatabases  ] [secesprd06] initialized default databases [[GeoLite2-Country.mmdb, GeoLite2-City.mmdb, GeoLite2-ASN.mmdb]], config databases [[]] and watching [/etc/elasticsearch/ingest-geoip] for changes
[2023-07-16T01:37:47,578][INFO ][o.e.i.g.DatabaseNodeService] [secesprd06] initialized database registry, using geoip-databases directory [/tmp/geoip-databases/XcpMS9QmT06yxJ0WX9avlQ]
[2023-07-16T01:37:49,492][INFO ][o.e.t.NettyAllocator     ] [secesprd06] creating NettyAllocator with the following configs: [name=elasticsearch_configured, chunk_size=1mb, suggested_max_allocation_size=1mb, factors={es.unsafe.use_netty_default_chunk_and_page_size=false, g1gc_enabled=true, g1gc_region_size=16mb}]
[2023-07-16T01:37:49,605][INFO ][o.e.i.r.RecoverySettings ] [secesprd06] using rate limit [125mb] with [default=125mb, read=0b, write=0b, max=0b]
[2023-07-16T01:37:49,703][INFO ][o.e.d.DiscoveryModule    ] [secesprd06] using discovery type [zen] and seed hosts providers [settings]
[2023-07-16T01:37:50,595][INFO ][o.e.g.DanglingIndicesState] [secesprd06] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2023-07-16T01:37:51,419][INFO ][o.e.n.Node               ] [secesprd06] initialized
[2023-07-16T01:37:51,420][INFO ][o.e.n.Node               ] [secesprd06] starting ...
[2023-07-16T01:37:51,528][INFO ][o.e.x.s.c.f.PersistentCache] [secesprd06] persistent cache index loaded
[2023-07-16T01:37:51,530][INFO ][o.e.x.d.l.DeprecationIndexingComponent] [secesprd06] deprecation component started
[2023-07-16T01:37:51,890][INFO ][o.e.t.TransportService   ] [secesprd06] publish_address {130.216.236.249:9300}, bound_addresses {[::]:9300}
[2023-07-16T01:37:56,050][INFO ][o.e.b.BootstrapChecks    ] [secesprd06] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2023-07-16T01:37:56,053][INFO ][o.e.c.c.Coordinator      ] [secesprd06] cluster UUID [CjDhDttPTLafKvn-MYx3vA]
[2023-07-16T01:37:56,872][INFO ][o.e.n.Node               ] [secesprd06] stopping ...
[2023-07-16T01:37:56,893][INFO ][o.e.x.w.WatcherService   ] [secesprd06] stopping watch service, reason [shutdown initiated]
[2023-07-16T01:37:56,894][INFO ][o.e.x.w.WatcherLifeCycleService] [secesprd06] watcher has stopped and shutdown
[2023-07-16T01:37:56,909][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [secesprd06] [controller/2881] [Main.cc@174] ML controller exiting
[2023-07-16T01:37:56,910][INFO ][o.e.x.m.p.NativeController] [secesprd06] Native controller process has stopped - no new native processes can be started
[2023-07-16T01:37:57,244][INFO ][o.e.h.AbstractHttpServerTransport] [secesprd06] publish_address {130.216.236.249:9200}, bound_addresses {[::]:9200}
[2023-07-16T01:37:57,245][INFO ][o.e.n.Node               ] [secesprd06] started
[2023-07-16T01:37:57,266][INFO ][o.e.n.Node               ] [secesprd06] stopped
[2023-07-16T01:37:57,266][INFO ][o.e.n.Node               ] [secesprd06] closing ...
[2023-07-16T01:37:57,332][INFO ][o.e.n.Node               ] [secesprd06] closed

Any idea what is causing this?

Other nodes in the cluster which have the same configuration do not have this problem.

DavidTurner · July 16, 2023, 6:31am

I suspect it's the situation described by "TIP: Startup timeouts with older systemd versions" in these docs, except that the startup timeout is only 75s in 7.17.1 (see this PR). If so, either upgrade your systemd, or Elasticsearch, or else adjust the startup timeout yourself.

Russell_Fulton · July 16, 2023, 11:09pm

Thanks David

As usual it seems that I did not give enough details : (

In this case I really want to know why ES is taking so long to start up after a reboot. When I manually restart ES it starts fine in under a minute. An identical VM running ES reboots and ES starts in the just fine.

I suspect some issue with VM but without some diagnostics ...

DavidTurner · July 17, 2023, 8:21am

I see. That's a trickier one to answer. Maybe all the nodes are taking just under 75s and this one is just marginally slower? In any case the default timeout is 900s now because we don't really consider minutes-long startup times to be particularly unusual.

Once the node is up does it have performance in line with other nodes? If not, that's worth investigating, and should be easier to investigate too because you're looking at something more like a steady-state situation. Capturing profiling data at startup is probably possible, but I'm not sure how I'd approach it.

system · August 14, 2023, 8:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES fails to start on reboot with systemctl but starts fine manually Elasticsearch	1	858	November 22, 2019
ES 7.3 - restarting data node doesn't rejoin cluster Elasticsearch	6	2037	December 21, 2020
ES: 5.2.2 ES datanode restart fails after repository-s3 plugin installed Elasticsearch	2	719	May 17, 2017
Elasticsearch fails to restart after upgrade Elasticsearch	10	1930	January 19, 2018
Elasticsearch fails to start after reboot Elasticsearch runtime-fields	2	273	December 3, 2023

ES fails to restart after reboot

Related topics