Elasticsearch Randomly Crashing

I'm not sure what the problem is here and I've been trying to search for a solution for days, so maybe someone here can help me out.

I'm using Elasticsearch 6. It starts fine, and works fine, except for ever several days or so it'll just stop running out of nowhere. I have no idea what's causing it.

This is the last thing in the log file: [INFO ][o.e.x.m.p.NativeController] Native controller process has stopped - no new native processes can be started

Any ideas on what I should be looking at/for?

Could you share the full logs?

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.

Here is the only thing in yesterday's log leading up to the crash.

[2020-02-10T01:01:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] [zzkCard] triggering scheduled [ML] maintenance tasks
[2020-02-10T01:01:00,008][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [zzkCard] Deleting expired data
[2020-02-10T01:01:00,021][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [zzkCard] Completed deletion of expired ML data
[2020-02-10T01:01:00,021][INFO ][o.e.x.m.MlDailyMaintenanceService] [zzkCard] Successfully completed [ML] maintenance tasks
[2020-02-10T13:50:59,447][INFO ][o.e.c.m.MetaDataIndexStateService] [zzkCard] closing indices [meateater/7mHTnuv0SOaACbdJwcoUew]
[2020-02-10T13:50:59,531][INFO ][o.e.c.m.MetaDataIndexStateService] [zzkCard] completed closing of indices [meateater]
[2020-02-10T13:50:59,630][INFO ][o.e.c.m.MetaDataIndexStateService] [zzkCard] opening indices [[meateater/7mHTnuv0SOaACbdJwcoUew]]
[2020-02-10T13:50:59,977][INFO ][o.e.c.r.a.AllocationService] [zzkCard] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[meateater][4], [meateater][3]] ...]).
[2020-02-10T14:49:09,012][INFO ][o.e.x.m.p.NativeController] [zzkCard] Native controller process has stopped - no new native processes can be started

Then here is yesterday's log for everything after that, including when I restarted ES.

[2020-02-10T16:12:20,384][INFO ][o.e.e.NodeEnvironment    ] [zzkCard] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [32.4gb], net total_space [49gb], types [rootfs]
[2020-02-10T16:12:20,388][INFO ][o.e.e.NodeEnvironment    ] [zzkCard] heap size [495.3mb], compressed ordinary object pointers [true]
[2020-02-10T16:12:20,408][INFO ][o.e.n.Node               ] [zzkCard] node name derived from node ID [zzkCardSTduaZ3z59PUQTA]; set [node.name] to override
[2020-02-10T16:12:20,408][INFO ][o.e.n.Node               ] [zzkCard] version[6.8.6], pid[30880], build[default/rpm/3d9f765/2019-12-13T17:11:52.013738Z], OS[Linux/3.10.0-957.12.2.vz7.96.21/amd64], JVM[Or$
[2020-02-10T16:12:20,409][INFO ][o.e.n.Node               ] [zzkCard] JVM arguments [-Xms512m, -Xmx512m, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly$
[2020-02-10T16:12:23,423][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [aggs-matrix-stats]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [analysis-common]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [ingest-common]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [ingest-geoip]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [ingest-user-agent]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [lang-expression]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [lang-mustache]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [lang-painless]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [mapper-extras]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [parent-join]
[2020-02-10T16:12:23,424][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [percolator]
[2020-02-10T16:12:23,425][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [rank-eval]
[2020-02-10T16:12:23,425][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [reindex]
[2020-02-10T16:12:23,425][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [repository-url]
[2020-02-10T16:12:23,426][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [transport-netty4]
[2020-02-10T16:12:23,426][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [tribe]
[2020-02-10T16:12:23,426][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-ccr]
[2020-02-10T16:12:23,426][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-core]
[2020-02-10T16:12:23,427][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-deprecation]
[2020-02-10T16:12:23,427][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-graph]
[2020-02-10T16:12:23,427][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-ilm]
[2020-02-10T16:12:23,427][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-logstash]
[2020-02-10T16:12:23,427][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-ml]
[2020-02-10T16:12:23,427][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-monitoring]
[2020-02-10T16:12:23,428][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-rollup]
[2020-02-10T16:12:23,428][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-security]
[2020-02-10T16:12:23,428][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-sql]
[2020-02-10T16:12:23,428][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-upgrade]
[2020-02-10T16:12:23,428][INFO ][o.e.p.PluginsService     ] [zzkCard] loaded module [x-pack-watcher]
[2020-02-10T16:12:23,429][INFO ][o.e.p.PluginsService     ] [zzkCard] no plugins loaded
[2020-02-10T16:12:29,700][INFO ][o.e.x.s.a.s.FileRolesStore] [zzkCard] parsed [0] roles from file [/etc/elasticsearch/roles.yml]
[2020-02-10T16:12:31,477][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [zzkCard] [controller/30946] [Main.cc@109] controller (64 bit): Version 6.8.6 (Build 73ed602c10c48e) Copyright (c) 2019 Elasticsearch BV
[2020-02-10T16:12:32,260][DEBUG][o.e.a.ActionModule       ] [zzkCard] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security
[2020-02-10T16:12:32,772][INFO ][o.e.d.DiscoveryModule    ] [zzkCard] using discovery type [zen] and host providers [settings]
[2020-02-10T16:12:33,724][INFO ][o.e.n.Node               ] [zzkCard] initialized
[2020-02-10T16:12:33,725][INFO ][o.e.n.Node               ] [zzkCard] starting ...
[2020-02-10T16:12:33,841][WARN ][i.n.u.i.MacAddressUtil   ] [zzkCard] Failed to find a usable hardware address from the network interfaces; using random bytes: da:34:37:a6:cb:82:2a:c2
[2020-02-10T16:12:33,911][INFO ][o.e.t.TransportService   ] [zzkCard] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2020-02-10T16:12:33,984][WARN ][o.e.b.BootstrapChecks    ] [zzkCard] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2020-02-10T16:12:37,071][INFO ][o.e.c.s.MasterService    ] [zzkCard] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {zzkCard}{zzkCardSTduaZ3z59PUQTA}{BrmQaoICRtqc_d0mhUn0kw}{localhos$
[2020-02-10T16:12:37,078][INFO ][o.e.c.s.ClusterApplierService] [zzkCard] new_master {zzkCard}{zzkCardSTduaZ3z59PUQTA}{BrmQaoICRtqc_d0mhUn0kw}{localhost}{127.0.0.1:9300}{ml.machine_memory=2147483648, xpa$
[2020-02-10T16:12:37,164][INFO ][o.e.h.n.Netty4HttpServerTransport] [zzkCard] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2020-02-10T16:12:37,164][INFO ][o.e.n.Node               ] [zzkCard] started
[2020-02-10T16:12:37,713][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [zzkCard] Failed to clear cache for realms [[]]
[2020-02-10T16:12:37,785][INFO ][o.e.l.LicenseService     ] [zzkCard] license [56c8d7eb-3773-4e5b-a0dc-2e885dff8926] mode [basic] - valid
[2020-02-10T16:12:37,798][INFO ][o.e.g.GatewayService     ] [zzkCard] recovered [1] indices into cluster_state
[2020-02-10T16:12:38,466][INFO ][o.e.c.r.a.AllocationService] [zzkCard] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[meateater][3]] ...]).

As of now, ES is running/working just fine. As I said, it only crashes every several days. Every time it does, the only line I'm noticing that looks out of place is the one from my original post.

Here is the log from Feb 7 which is the last time, before yesterday, ES crashed. I didn't realize it had quit until the next day, so this is the entire day's log.

[2020-02-07T01:01:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] [zzkCard] triggering scheduled [ML] maintenance tasks
[2020-02-07T01:01:00,017][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [zzkCard] Deleting expired data
[2020-02-07T01:01:00,030][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [zzkCard] Completed deletion of expired ML data
[2020-02-07T01:01:00,030][INFO ][o.e.x.m.MlDailyMaintenanceService] [zzkCard] Successfully completed [ML] maintenance tasks
[2020-02-07T14:49:08,190][INFO ][o.e.x.m.p.NativeController] [zzkCard] Native controller process has stopped - no new native processes can be started

You have only 500mb of HEAP?

The usual reason for Elasticsearch stopping without emitting any log messages is that it was killed with extreme prejudice by an external force (on Linux this means SIGKILL often from the kernel's OOM killer).

The fact that the ML controller process reportedly stopped just before Elasticsearch stopped is consistent with that behaviour: the OOM killer often picks the controller first and then turns on Elasticsearch when it discovers that killing the controller didn't free up enough memory.

I suggest you look in your kernel logs for more information (either dmesg or other system logs)

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.