Elasticserach not starting

One of my Elasticsearch nodes stopped the service and moved the master to another node.
The logs do not show much info, why but rather just say master moved to a different node.

[2021-11-23T13:57:21,216][INFO ][o.e.n.Node ] [s32.mydomain.local] initialized
[2021-11-23T13:57:21,217][INFO ][o.e.n.Node ] [s32.mydomain.local] starting ...
[2021-11-23T13:57:21,378][INFO ][o.e.x.s.c.PersistentCache] [s32.mydomain.local] persistent cache index loaded
[2021-11-23T13:57:21,482][INFO ][o.e.t.TransportService ] [s32.mydomain.local] publish_address {10.1.9.132:9300}, bound_addresses {[::]:9300}
[2021-11-23T13:57:25,160][INFO ][o.e.b.BootstrapChecks ] [s32.mydomain.local] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-11-23T13:57:25,180][INFO ][o.e.c.c.Coordinator ] [s32.mydomain.local] cluster UUID [OWJKtACaQKekQyQdMKZ0hw]
[2021-11-23T13:57:25,708][INFO ][o.e.c.s.ClusterApplierService] [s32.mydomain.local] master node changed {previous , current [{s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.9.134}{10.1.9.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}]}, added {{s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.9.134}{10.1.9.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false},{s33.mydomain.local}{c9UZRp6fTJWh5gtD2jaglg}{GQow_82xTiWJtcxgGzP42g}{10.1.9.133}{10.1.9.133:9300}{cdfhilmrstw}{ml.machine_memory=16654835712, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=10737418240, transform.node=true}}, term: 23, version: 68964, reason: ApplyCommitRequest{term=23, version=68964, sourceNode={s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.9.134}{10.1.9.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}}

Is there more to the log?

Hello Mark,

[2021-11-23T13:57:25,721][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [95%]
[2021-11-23T13:57:25,721][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [99%]
[2021-11-23T13:57:25,722][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [xpack.monitoring.collection.enabled] from [false] to [true]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [indices.recovery.max_bytes_per_sec] from [40mb] to [250mb]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [95%]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [99%]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [95%]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [99%]
[2021-11-23T13:57:25,723][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [95%]
[2021-11-23T13:57:25,724][INFO ][o.e.c.s.ClusterSettings ] [s32.mydomain.local] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [99%]
[2021-11-23T13:57:30,481][INFO ][o.e.x.s.a.TokenService ] [s32.mydomain.local] refresh keys
[2021-11-23T13:57:30,620][INFO ][o.e.x.s.a.TokenService ] [s32.mydomain.local] refreshed keys
[2021-11-23T13:57:32,397][INFO ][o.e.l.LicenseService ] [s32.mydomain.local] license [8b4dbf1a-b8ad-4f2c-b7b9-5fe0e0da7421] mode [basic] - valid
[2021-11-23T13:57:32,510][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [s32.mydomain.local] Active license is now [BASIC]; Security is disabled

Thats all to the end of the log file.

It looks like you are are running out of disk space.

Not really, that is what I thought too. But I deleted old files using curator.
It was 90% initially and stopped the Elasticsearch service all the time. I had to start the service all the time using the command "systemctl restart Elasticsearch
". However, after a while it won't even start, hence I deleted the files which are 90 days older.
After deleting the files the master node got switches to another node ( I dot three nodes running Elasticsearch in cluter" . This was the master node, after running curator, master got switches to another node and the service wont start at all now.

As you an see below, now I have only used up to 51% of the node space.
However, I ran the command "cluster.routing.allocation.disk.watermark.high" and "cluster.routing.allocation.disk.watermark.flood_stage" to 95% and 99 % , before running the curator. To avoid elastiscserach getting stopped.
Now it got to a point that the master got failed over and now even after freeing space the service will not start at all.

[root@s33 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 704M 7.1G 9% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos_srvde433-root 50G 32G 19G 64% /
/dev/sda2 1014M 223M 792M 22% /boot
/dev/mapper/centos_srvde433-home 2.0T 1014G 975G 51% /home
tmpfs 1.6G 0 1.6G 0% /run/user/0

How did you "delete the files" exactly?

i used curator to delete the files.
Basically I ran the action.yml file under curator. Ran the command
[root@s33 curator]#curator --config /etc/curator/config.yml /etc/curator/action.yml

These files have the following setting inside (to delete the old files).
[root@s33 curator]# cat action.yml
actions:
1:
action: delete_indices
description: Delete_indices_older_90_days
options:
ignore_empty_list: True # Create INFO if empty file if "False" create ERROR and exit
timeout_override:
continue_if_exception: False
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: (collectd|opt)-*
exclude:
- filtertype: age # Filter old age index
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 90

[root@s33 curator]# cat config.yml
client:
hosts:

  • 127.0.0.1
    port: 9200
    url_prefix:
    use_ssl: False
    certificate:
    client_cert:
    client_key:
    ssl_no_validate: False
    http_auth:
    timeout: 30
    master_only: False

Also , on another node, i could see the following log of removing the node from the cluster. But couldn't understand the reason.

[2021-11-23T13:54:31,440][INFO ][o.e.c.s.ClusterApplierService] [s33.mydomain.local] removed {{s32.mydomain.local}{GeNSVakDRaOMc7Qv0NdQIw}{DKgx7dyCRtm7M0wanjIMRQ}{10.1.8.132}{10.1.8.132:9300}{cdfhilmrstw}{ml.machine_memory=16654835712, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=10737418240, transform.node=true}}, term: 23, version: 68960, reason: ApplyCommitRequest{term=23, version=68960, sourceNode={s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.8.134}{10.1.8.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}}
[2021-11-23T13:57:25,665][INFO ][o.e.c.s.ClusterApplierService] [s33.mydomain.local] added {{s32.mydomain.local}{GeNSVakDRaOMc7Qv0NdQIw}{ikDT5bu3RQiWkZaK8akGcQ}{10.1.8.132}{10.1.8.132:9300}{cdfhilmrstw}{ml.machine_memory=16654835712, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=10737418240, transform.node=true}}, term: 23, version: 68964, reason: ApplyCommitRequest{term=23, version=68964, sourceNode={ss34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.8.134}{10.1.8.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}}
[2021-11-23T13:57:36,433][INFO ][o.e.c.s.ClusterApplierService] [s33.mydomain.local] removed {{s32.mydomain.local}{GeNSVakDRaOMc7Qv0NdQIw}{ikDT5bu3RQiWkZaK8akGcQ}{10.1.8.132}{10.1.8.132:9300}{cdfhilmrstw}{ml.machine_memory=16654835712, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=10737418240, transform.node=true}}, term: 23, version: 68966, reason: ApplyCommitRequest{term=23, version=68966, sourceNode={s34.mydomain.local}{J0a50UUERau7eS8RNye1EA}{yi4J1_P7QMSic0DkXPIvaQ}{10.1.8.134}{10.1.8.134:9300}{lmr}{ml.machine_memory=12428423168, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=4294967296, transform.node=false}}

Hi Mark,

Thank you for looking into it, it simply got resolved after the reboot. I don't know how, but it looks more like a memory issue than disk space.

Zanoob

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.