Elasticsearch will not start after installing 7.9.2

Elasticsearch will not start after we installed 7.9.2. When started it enters a failed state. I ran journalctl -xe it points to Unregistered Authentication Agent. But when I check the logs I see the following:

    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:273) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:240) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2563) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.index.shard.IndexShard.runUnderPrimaryPermit(IndexShard.java:2639) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.index.IndexService.sync(IndexService.java:799) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.index.IndexService.syncRetentionLeases(IndexService.java:782) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.index.IndexService.access$800(IndexService.java:100) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.index.IndexService$AsyncRetentionLeaseSyncTask.runInternal(IndexService.java:960) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:144) ~[elasticsearch-6.8.7.jar:6.8.7]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-6.8.7.jar:6.8.7]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: org.elasticsearch.transport.TransportException: TransportService is closed stopped can't send request
        at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:626) ~[elasticsearch-6.8.7.jar:6.8.7]
        ... 55 more
[2021-02-04T16:54:46,536][INFO ][o.e.n.Node               ] [dev-es-01] stopped
[2021-02-04T16:54:46,537][INFO ][o.e.n.Node               ] [dev-es-01] closing ...
[2021-02-04T16:54:46,554][INFO ][o.e.n.Node               ] [dev-es-01] closed    

Hi @mike_wills Welcome to the community!

Something does not look right as you stated that you installed 7.9.2 but all the error messages 6.8.7.

I would check that 7.9.2 actually properly installed...

Thanks ! I will re-install and see what happens.

Still no good.

Could you share your startup logs?

[2021-02-25T23:03:17.319+0000][5414][gc] Using G1
[2021-02-25T23:03:17.352+0000][5414][gc,init] Version: 15+36 (release)
[2021-02-25T23:03:17.353+0000][5414][gc,init] CPUs: 4 total, 4 available
[2021-02-25T23:03:17.353+0000][5414][gc,init] Memory: 15589M
[2021-02-25T23:03:17.353+0000][5414][gc,init] Large Page Support: Disabled
[2021-02-25T23:03:17.353+0000][5414][gc,init] NUMA Support: Disabled
[2021-02-25T23:03:17.353+0000][5414][gc,init] Compressed Oops: Enabled (Zero based)
[2021-02-25T23:03:17.353+0000][5414][gc,init] Heap Region Size: 8M
[2021-02-25T23:03:17.353+0000][5414][gc,init] Heap Min Capacity: 16G
[2021-02-25T23:03:17.353+0000][5414][gc,init] Heap Initial Capacity: 16G
[2021-02-25T23:03:17.353+0000][5414][gc,init] Heap Max Capacity: 16G
[2021-02-25T23:03:17.353+0000][5414][gc,init] Pre-touch: Disabled
[2021-02-25T23:03:17.353+0000][5414][gc,init] Parallel Workers: 4
[2021-02-25T23:03:17.353+0000][5414][gc,init] Concurrent Workers: 1
[2021-02-25T23:03:17.353+0000][5414][gc,init] Concurrent Refinement Workers: 4
[2021-02-25T23:03:17.353+0000][5414][gc,init] Periodic GC: Disabled
[2021-02-25T23:03:17.353+0000][5414][gc,metaspace] CDS disabled.
[2021-02-25T23:03:17.353+0000][5414][gc,metaspace] Compressed class space mapped at: 0x00000007c0000000-0x0000000800000000, size: 1073741824
[2021-02-25T23:03:17.353+0000][5414][gc,metaspace] Narrow klass base: 0x0000000000000000, Narrow klass shift: 3, Narrow klass range: 0x800000000
[2021-02-25T23:03:17.436+0000][5414][gc,heap,exit] Heap
[2021-02-25T23:03:17.436+0000][5414][gc,heap,exit] garbage-first heap total 16777216K, used 8192K [0x00000003c0000000, 0x00000007c0000000)
[2021-02-25T23:03:17.436+0000][5414][gc,heap,exit] region size 8192K, 2 young (16384K), 0 survivors (0K)
[2021-02-25T23:03:17.436+0000][5414][gc,heap,exit] Metaspace used 3303K, capacity 4480K, committed 4480K, reserved 1056768K
[2021-02-25T23:03:17.436+0000][5414][gc,heap,exit] class space used 266K, capacity 384K, committed 384K, reserved 1048576K

Are those the GC logs not the regular logs like above... We need a look at the regular elastic logs either though the journal like above or /var/log/elasticsearch/elasticsearch.log

There should be some messages near the end that indicate the issues... that above shows nothing...

Regular logs should look like something like this...

[2021-02-25T16:00:29,498][INFO ][o.e.p.PluginsService     ] [ceres] loaded module [aggs-matrix-stats]
[2021-02-25T16:00:29,498][INFO ][o.e.p.PluginsService     ] [ceres] loaded module [analysis-common]
[2021-02-25T16:00:29,498][INFO ][o.e.p.PluginsService     ] [ceres] loaded module [constant-keyword]
[2021-02-25T16:00:29,498][INFO ][o.e.p.PluginsService     ] [ceres] loaded module [flattened]
[2021-02-25T16:00:29,498][INFO ][o.e.p.PluginsService     ] [ceres] loaded module [frozen-indices]
[2021-02-25T16:00:43,881][INFO ][o.e.c.r.a.AllocationService] [ceres] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[metricbeat-7.10.2-2021.02.11-000001][0]]]).

I am tracking. In the elasticsearch.yml the path.log is pointing to another directory that is called "elasticsearch/logs/" and there are no logs.

Got to find your logs they may be going to syslog what do you see when you run systemctl status

  State: degraded
   Jobs: 0 queued
 Failed: 1 units
  Since: Fri 2021-02-05 15:14:16 UTC; 2 weeks 6 days ago
 CGroup: /
         ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
         ├─user.slice
         │ ├─user-815602357.slice
         │ │ ├─session-33608.scope
         │ │ │ ├─14725 sshd: user [priv]
         │ │ │ ├─14753 sshd: user@notty
         │ │ │ └─14754 /usr/libexec/openssh/sftp-server
         │ │ └─session-33607.scope
         │ │   ├─14718 sshd: user [priv]
         │ │   ├─14729 sshd: user@pts/0
         │ │   ├─14730 -bash
         │ │   ├─14803 systemctl status
         │ │   └─14804 less
         │ └─user-0.slice
         │   └─session-33532.scope
         │     ├─9911 /usr/sbin/CROND -n
         │     ├─9915 /bin/bash /usr/share/clamav/freshclam-sleep
         │     └─9925 sleep 9681
         └─system.slice
           ├─crond.service
           │ └─1258 /usr/sbin/crond -n
           ├─rsyslog.service
           │ └─1253 /usr/sbin/rsyslogd -n
           ├─sshd.service
           │ └─1249 /usr/sbin/sshd -D
           ├─salt-minion.service
           │ ├─1063 /usr/bin/python3 -s /usr/bin/salt-minion
           │ ├─1263 /usr/bin/python3 -s /usr/bin/salt-minion
           │ └─1313 /usr/bin/python3 -s /usr/bin/salt-minion
           ├─postfix.service
           │ ├─1255 /usr/libexec/postfix/master -w
           │ ├─1262 qmgr -l -t unix -u
           │ └─7369 pickup -l -t unix -u
           ├─oddjobd.service
           │ └─1061 /usr/sbin/oddjobd -n -p /var/run/oddjobd.pid -t 300
           ├─amazon-cloudwatch-agent.service
           │ └─1060 /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml -pidfile /opt/aws/amazon-cloudwatch-agent/var/amazon-cloudwatch-agent.pid
           ├─rhsmcertd.service
           │ └─1064 /usr/bin/rhsmcertd
           ├─goferd.service
           │ └─1058 python /usr/bin/goferd --foreground
           ├─clam-freshclam.service
           │ └─1070 /usr/bin/freshclam -d -c 4
           ├─tuned.service
           │ └─1056 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
           ├─NetworkManager.service
           │ ├─829 /usr/sbin/NetworkManager --no-daemon
           │ └─853 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.c
           ├─systemd-logind.service
           │ └─795 /usr/lib/systemd/systemd-logind
           ├─sssd.service
           │ ├─756 /usr/sbin/sssd -i --logger=files
           │ ├─785 /usr/libexec/sssd/sssd_be --domain autumnal.local --uid 0 --gid 0 --logger=files
           │ ├─791 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
           │ └─792 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files
           ├─dbus.service
           │ └─746 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
           ├─polkit.service
           │ └─745 /usr/lib/polkit-1/polkitd --no-debug
           ├─irqbalance.service

so perhaps, Let's start from the beginning I would like to help but I really can't do anything without the logs and the startup.

Exactly how and which distribution which OS Deb or RPM did you install.

What command did you use to install.

What was the result of that command.

When I said system control status apologies I thought you would recognize that the command typically

systemctl status elasticsearch

How are you starting elasticsearch

How are you logging?

sudo journalctl --unit elasticsearch

Can you share you elasticsearch.yml

My mistake.
OS: CentOS 7
they use Saltstack to run commands "yum install" and "systemctl start elasticsearch " to startup services

#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-00
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /elasticsearch/data
#
# Path to log files:
#
path.logs: /elasticsearch/log
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts:
  - 1xx.xx.x.xxx:9200
  - 1xx.xx.x.xxx:9200
  - 1xx.xx.x.xxx:9200
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes:
   - domain-00
   - domain-01
   - domain-02
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
action.destructive_requires_name: true

reindex.remote.whitelist: "Linuxvm:9202,Linuxvm2:443"
reindex.ssl.verification_mode : "none"

xpack.security.enabled: true
xpack.monitoring.enabled: true
xpack.graph.enabled: true
xpack.watcher.enabled: true

xpack.monitoring.exporters.local_default2:
  type: local

xpack.ssl.keystore.path: certs/elastic-certificates.p12
xpack.ssl.truststore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
#xpack.security.http.ssl.enabled: true
xpack.security.audit.enabled: true

xpack:
    security:
        authc:
            realms:
                native1:
                    type: native
                    order: 0
                active_directory:
                    type: active_directory

                    order: 1
                    domain_name: 
                    url: 
                    bind_dn: 
                    load_balance:
                        type: "round_robin"
                    unmapped_groups_as_roles: true
                file1:
                    type: file
                    order: 0
path.repo: /net/snaps

What about the logs or the output of

systemctl status elasticsearch

Sometimes that gets cut off so you try

systemctl status elasticsearch >> status.log

Then post status.log

● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/elasticsearch.service.d
└─override.conf
Active: failed (Result: signal) since Fri 2021-02-26 13:00:46 UTC; 17min ago
Docs: https://www.elastic.co
Process: 2916 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=KILL)
Main PID: 2916 (code=killed, signal=KILL)

Feb 26 12:59:47 dev-es-00.autumnal.local systemd[1]: Starting Elasticsearch...
Feb 26 13:00:46 dev-es-00.autumnal.local systemd[1]: elasticsearch.service: main process exited, code=killed, status=9/KILL
Feb 26 13:00:46 dev-es-00.autumnal.local systemd[1]: Failed to start Elasticsearch.
Feb 26 13:00:46 dev-es-00.autumnal.local systemd[1]: Unit elasticsearch.service entered failed state.
Feb 26 13:00:46 dev-es-00.autumnal.local systemd[1]: elasticsearch.service failed.
Unit status.logs.service could not be found.

Yeah it's failing for sure but I can't tell why from that.

Can you run the

journalctl -xe

while you run

systemctl start elasticsearch

And post the results. We really need to find the normal logs.

[root@node mwills@domain]# systemctl start elasticsearch || journalctl -xe
Job for elasticsearch.service failed because a fatal signal was delivered to the control process. See "systemctl status elasticsearch.service" and "journalctl -xe" for details.
Journal file /var/log/journal/2b2dca37c19647b09cf1f366fc7406ea/user-815601170.journal is truncated, ignoring file.
-- Subject: Session 34524 has been terminated
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: http://www.freedesktop.org/wiki/Software/systemd/multiseat
--
-- A session with the ID 34524 has been terminated.
Feb 26 14:46:30 node dbus[746]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatch
Feb 26 14:46:30 node dhclient[853]: bound to 172.29.1.179 -- renewal in 1610 seconds.
Feb 26 14:46:31 node systemd[1]: Starting Network Manager Script Dispatcher Service...
-- Subject: Unit NetworkManager-dispatcher.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit NetworkManager-dispatcher.service has begun starting up.
Feb 26 14:46:31 node systemd[1]: Started Session 34536 of user root.
-- Subject: Unit session-34536.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-34536.scope has finished starting up.
--
-- The start-up result is done.
Feb 26 14:46:31 node CROND[11723]: (root) CMD (/usr/bin/aws-kinesis-agent-babysit > /dev/null 2>&1)
Feb 26 14:46:31 node dbus[746]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Feb 26 14:46:31 node systemd[1]: Started Network Manager Script Dispatcher Service.
-- Subject: Unit NetworkManager-dispatcher.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit NetworkManager-dispatcher.service has finished starting up.
--
-- The start-up result is done.
Feb 26 14:46:31 node nm-dispatcher[11722]: req:1 'dhcp4-change' [eth0]: new request (4 scripts)
Feb 26 14:46:31 node nm-dispatcher[11722]: req:1 'dhcp4-change' [eth0]: start running ordered scripts...
lines 1980-2013/2013 (END)

To be blunt I do not know what is going on and none of these are the normal logs without them I can not really help.

I do see this did you or someone else modify the startup script?

I am not a journalctl expert.

How did you get the logs in the very first post?

The other option is just to start elasticsearch in the foreground and see what happens see here

ES_PATH_CONF=/etc/elasticsearch /usr/share/elasticsearch/bin/elasticsearch

If you try to start it in the foreground we should see something helpful.

Okay, I'll give it a shot.

No it wasn't me