Elasticsearch crashes ubuntu vm

We have two VMware VMs with Ubuntu 16.4 x64 on them. Each one has 2gb of memory and 1 cpu core.
We have installed elasticsearch (5.1.2) on every vm as systemd service. All properties are default, except of xmx and xms (both equal 1g).
Every time elasticsearch starts on any of this vm's, it gets stuck, uses 100% cpu and does not respond on any network protocol.

Can anyone helps us with that?

What are saying elasticsearch logs?

You might have GC cycles here. May be too many shards on your instance?
May be you have an OOM Killer which kills your JVM?

Can you provide the following informations?

GET /_cat/indices?v
GET /_cat/shards?v
GET /_nodes/stats

I've tried with different gc's and with different memory options (1g, 768m, 512m), but nothing helped.
There are nothing interesting in the elastic logs. They are similar to any other environmet where we are using elastic.
Our elastic is completely empty, there are no indices or shards.

There is no any OOM killer logs in the journalctl and vm just gets stuck with 100% cpu utilizations, so I doubt about it...

Elastic is completely empty, so /_cat/indices and /_cat/shards do not show anything.

/_nodes/stats

Can you share the full logs of both nodes please?

Is this happening also when you have only the first node started?

Seems like almost always it happens when both instances are online, but with single online node it happend too (two or three times)

Log of one node from starting elasticsearch to killing the machine
[2018-01-23T11:42:35,945][INFO ][o.e.n.Node               ] [ilex-qa-1] initializing ...
[2018-01-23T11:42:36,173][INFO ][o.e.e.NodeEnvironment    ] [ilex-qa-1] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [5.7gb], net total_space [7.9gb], spins? [possibly], types [btrfs]
[2018-01-23T11:42:36,175][INFO ][o.e.e.NodeEnvironment    ] [ilex-qa-1] heap size [1015.6mb], compressed ordinary object pointers [true]
[2018-01-23T11:42:36,178][INFO ][o.e.n.Node               ] [ilex-qa-1] node name [ilex-qa-1], node ID [vL1RAmYfSyuknVs7ck-vrg]
[2018-01-23T11:42:36,182][INFO ][o.e.n.Node               ] [ilex-qa-1] version[5.1.2], pid[2826], build[c8c4c16/2017-01-11T20:18:39.146Z], OS[Linux/4.13.0-25-generic/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_161/25.161-b12]
[2018-01-23T11:42:40,221][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [aggs-matrix-stats]
[2018-01-23T11:42:40,222][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [ingest-common]
[2018-01-23T11:42:40,223][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [lang-expression]
[2018-01-23T11:42:40,224][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [lang-groovy]
[2018-01-23T11:42:40,224][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [lang-mustache]
[2018-01-23T11:42:40,225][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [lang-painless]
[2018-01-23T11:42:40,226][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [percolator]
[2018-01-23T11:42:40,227][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [reindex]
[2018-01-23T11:42:40,227][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [transport-netty3]
[2018-01-23T11:42:40,228][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded module [transport-netty4]
[2018-01-23T11:42:40,230][INFO ][o.e.p.PluginsService     ] [ilex-qa-1] loaded plugin [analysis-morphology]
[2018-01-23T11:42:47,649][INFO ][o.e.n.Node               ] [ilex-qa-1] initialized
[2018-01-23T11:42:47,652][INFO ][o.e.n.Node               ] [ilex-qa-1] starting ...
[2018-01-23T11:42:48,058][INFO ][o.e.t.TransportService   ] [ilex-qa-1] publish_address {172.20.69.20:9300}, bound_addresses {172.20.69.20:9300}, {127.0.0.1:9300}
[2018-01-23T11:42:48,068][INFO ][o.e.b.BootstrapCheck     ] [ilex-qa-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-23T11:42:51,245][INFO ][o.e.c.s.ClusterService   ] [ilex-qa-1] new_master {ilex-qa-1}{vL1RAmYfSyuknVs7ck-vrg}{cLIafcgUSuifkXtpms_tDg}{172.20.69.20}{172.20.69.20:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2018-01-23T11:42:51,320][INFO ][o.e.h.HttpServer         ] [ilex-qa-1] publish_address {172.20.69.20:9200}, bound_addresses {172.20.69.20:9200}, {127.0.0.1:9200}
[2018-01-23T11:42:51,321][INFO ][o.e.n.Node               ] [ilex-qa-1] started
[2018-01-23T11:42:51,400][INFO ][o.e.g.GatewayService     ] [ilex-qa-1] recovered [0] indices into cluster_state

As you don't have any data yet in your cluster, could you remove analysis-morphology plugin first and restart and see if you are still seeing 100% CPU usage?

BTW could you upgrade to 5.6 because 5.1.2 is old now and many things have been fixed in the meantime?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.