Elasticserach not starting

zanoob · November 23, 2021, 3:46pm

One of my Elasticsearch nodes stopped the service and moved the master to another node.
The logs do not show much info, why but rather just say master moved to a different node.

[2021-11-23T13:57:21,216][INFO ][o.e.n.Node ] [s32.mydomain.local] initialized
[2021-11-23T13:57:21,217][INFO ][o.e.n.Node ] [s32.mydomain.local] starting ...
[2021-11-23T13:57:21,378][INFO ][o.e.x.s.c.PersistentCache] [s32.mydomain.local] persistent cache index loaded
[2021-11-23T13:57:21,482][INFO ][o.e.t.TransportService ] [s32.mydomain.local] publish_address {10.1.9.132:9300}, bound_addresses {[::]:9300}
[2021-11-23T13:57:25,160][INFO ][o.e.b.BootstrapChecks ] [s32.mydomain.local] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-11-23T13:57:25,180][INFO ][o.e.c.c.Coordinator ] [s32.mydomain.local] cluster UUID [OWJKtACaQKekQyQdMKZ0hw]
[2021-11-23T13:57:25,708][INFO ][o.e.c.s.ClusterApplierService] [s32.mydomain.local] master node changed {previous , current [{s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.9.134}{10.1.9.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}]}, added {{s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.9.134}{10.1.9.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false},{s33.mydomain.local}{c9UZRp6fTJWh5gtD2jaglg}{GQow_82xTiWJtcxgGzP42g}{10.1.9.133}{10.1.9.133:9300}{cdfhilmrstw}{ml.machine_memory=16654835712, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=10737418240, transform.node=true}}, term: 23, version: 68964, reason: ApplyCommitRequest{term=23, version=68964, sourceNode={s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.9.134}{10.1.9.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}}

warkolm · November 23, 2021, 8:58pm

Is there more to the log?

zanoob · November 23, 2021, 10:19pm

Hello Mark,

[2021-11-23T13:57:25,721][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [95%]
[2021-11-23T13:57:25,721][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [99%]
[2021-11-23T13:57:25,722][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [xpack.monitoring.collection.enabled] from [false] to [true]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [indices.recovery.max_bytes_per_sec] from [40mb] to [250mb]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [95%]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [99%]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [95%]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [99%]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [95%]
[2021-11-23T13:57:25,724][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [99%]
[2021-11-23T13:57:30,481][INFO ][o.e.x.s.a.TokenService ] [s32.mydomain.local] refresh keys
[2021-11-23T13:57:30,620][INFO ][o.e.x.s.a.TokenService ] [s32.mydomain.local] refreshed keys
[2021-11-23T13:57:32,397][INFO ][o.e.l.LicenseService ] [s32.mydomain.local] license [8b4dbf1a-b8ad-4f2c-b7b9-5fe0e0da7421] mode [basic] - valid
[2021-11-23T13:57:32,510][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [s32.mydomain.local] Active license is now [BASIC]; Security is disabled

Thats all to the end of the log file.

warkolm · November 23, 2021, 11:18pm

It looks like you are are running out of disk space.

zanoob · November 24, 2021, 8:49am

Not really, that is what I thought too. But I deleted old files using curator.
It was 90% initially and stopped the Elasticsearch service all the time. I had to start the service all the time using the command "systemctl restart Elasticsearch
". However, after a while it won't even start, hence I deleted the files which are 90 days older.
After deleting the files the master node got switches to another node ( I dot three nodes running Elasticsearch in cluter" . This was the master node, after running curator, master got switches to another node and the service wont start at all now.

As you an see below, now I have only used up to 51% of the node space.
However, I ran the command "cluster.routing.allocation.disk.watermark.high" and "cluster.routing.allocation.disk.watermark.flood_stage" to 95% and 99 % , before running the curator. To avoid elastiscserach getting stopped.
Now it got to a point that the master got failed over and now even after freeing space the service will not start at all.

[root@s33 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 704M 7.1G 9% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos_srvde433-root 50G 32G 19G 64% /
/dev/sda2 1014M 223M 792M 22% /boot
/dev/mapper/centos_srvde433-home 2.0T 1014G 975G 51% /home
tmpfs 1.6G 0 1.6G 0% /run/user/0

warkolm · November 24, 2021, 8:48pm

How did you "delete the files" exactly?

zanoob · November 25, 2021, 8:23am

i used curator to delete the files.
Basically I ran the action.yml file under curator. Ran the command
[root@s33 curator]#curator --config /etc/curator/config.yml /etc/curator/action.yml

These files have the following setting inside (to delete the old files).
[root@s33 curator]# cat action.yml
actions:
1:
action: delete_indices
description: Delete_indices_older_90_days
options:
ignore_empty_list: True # Create INFO if empty file if "False" create ERROR and exit
timeout_override:
continue_if_exception: False
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: (collectd|opt)-*
exclude:
- filtertype: age # Filter old age index
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 90

[root@s33 curator]# cat config.yml
client:
hosts:

127.0.0.1
port: 9200
url_prefix:
use_ssl: False
certificate:
client_cert:
client_key:
ssl_no_validate: False
http_auth:
timeout: 30
master_only: False

zanoob · November 26, 2021, 10:16am

Also , on another node, i could see the following log of removing the node from the cluster. But couldn't understand the reason.

[2021-11-23T13:54:31,440][INFO ][o.e.c.s.ClusterApplierService] [s33.mydomain.local] removed {{s32.mydomain.local}{GeNSVakDRaOMc7Qv0NdQIw}{DKgx7dyCRtm7M0wanjIMRQ}{10.1.8.132}{10.1.8.132:9300}{cdfhilmrstw}{ml.machine_memory=16654835712, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=10737418240, transform.node=true}}, term: 23, version: 68960, reason: ApplyCommitRequest{term=23, version=68960, sourceNode={s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.8.134}{10.1.8.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}}
[2021-11-23T13:57:25,665][INFO ][o.e.c.s.ClusterApplierService] [s33.mydomain.local] added {{s32.mydomain.local}{GeNSVakDRaOMc7Qv0NdQIw}{ikDT5bu3RQiWkZaK8akGcQ}{10.1.8.132}{10.1.8.132:9300}{cdfhilmrstw}{ml.machine_memory=16654835712, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=10737418240, transform.node=true}}, term: 23, version: 68964, reason: ApplyCommitRequest{term=23, version=68964, sourceNode={ss34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.8.134}{10.1.8.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}}
[2021-11-23T13:57:36,433][INFO ][o.e.c.s.ClusterApplierService] [s33.mydomain.local] removed {{s32.mydomain.local}{GeNSVakDRaOMc7Qv0NdQIw}{ikDT5bu3RQiWkZaK8akGcQ}{10.1.8.132}{10.1.8.132:9300}{cdfhilmrstw}{ml.machine_memory=16654835712, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=10737418240, transform.node=true}}, term: 23, version: 68966, reason: ApplyCommitRequest{term=23, version=68966, sourceNode={s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.8.134}{10.1.8.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}}

zanoob · November 29, 2021, 9:38am

Hi Mark,

Thank you for looking into it, it simply got resolved after the reboot. I don't know how, but it looks more like a memory issue than disk space.

Zanoob

system · December 27, 2021, 9:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic search couldnt start after deleting the nodes-HELP! URGENT Elasticsearch	26	1450	November 13, 2019
Elasticsearch suddenly stop Elasticsearch	5	861	March 10, 2019
ERROR: Elasticsearch died while starting up, with exit code 78 Elasticsearch	6	4035	January 16, 2025
Elasticsearch service suddenly stop after few minutes started Elasticsearch	11	2233	December 30, 2020
User Deleted All Files Inside Node0 Directory Elasticsearch	6	498	September 22, 2020

Elasticserach not starting

Related topics