Elasticsearch service failing

Hi all, ELK n00b here, with a problem that I need someone more knowledgable's assistance in solving...

I have a single-node v5.4.1 ELK cluster running on Ubuntu 16.04.2. I went to use Kibana after a long time, and Kibana was showing "Red" status:

Ran a GET on health status, saw this:

root@logstash01:/var/log# curl -XGET 'http://localhost:9200/_cluster/health?pretty'
{
  "cluster_name" : "logstash",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 1,
  "unassigned_shards" : 2921,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 2,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 1923,
  "active_shards_percent_as_number" : 0.0
}

In a few minutes, it went to this:

root@logstash01:/var/log# curl -XGET 'http://localhost:9200/_cluster/health?pretty'
curl: (7) Failed to connect to localhost port 9200: Connection refused

So I checked the elasticsearch service via systemctl, and saw:

root@logstash01:/var/log# systemctl status elasticsearch
* elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2017-07-17 13:18:12 EDT; 54min ago
     Docs: http://www.elastic.co
  Process: 5308 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -E
  Process: 5304 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0 Main PID: 5308 (code=exited, status=127)

Jul 17 13:07:13 logstash01 systemd[1]: Starting Elasticsearch...
Jul 17 13:07:13 logstash01 systemd[1]: Started Elasticsearch.
Jul 17 13:18:12 logstash01 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=127/n/a Jul 17 13:18:12 logstash01 systemd[1]: elasticsearch.service: Unit entered failed state.
Jul 17 13:18:12 logstash01 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

And this in /var/log/elasticsearch/elasticsearch.log:

root@logstash01:/var/log# tail -n24 /var/log/elasticsearch/elasticsearch.log
[2017-01-20T16:19:30,600][INFO ][o.e.n.Node               ] node name [Mu6cXVq] derived from node ID [Mu6cXVqFQFOjm1rUhVRYBw]; set [node.name] to override
[2017-01-20T16:19:30,603][INFO ][o.e.n.Node               ] version[5.1.2], pid[9797], build[c8c4c16/2017-01-11T20:18:39.146Z], OS[Linux/4.4.0-59-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b14]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [aggs-matrix-stats]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [ingest-common]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [lang-expression]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [lang-groovy]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [lang-mustache]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [lang-painless]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [percolator]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [reindex]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [transport-netty3]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] loaded module [transport-netty4]
[2017-01-20T16:19:31,798][INFO ][o.e.p.PluginsService     ] [Mu6cXVq] no plugins loaded
[2017-01-20T16:19:34,200][INFO ][o.e.n.Node               ] initialized
[2017-01-20T16:19:34,201][INFO ][o.e.n.Node               ] [Mu6cXVq] starting ...
[2017-01-20T16:19:34,425][INFO ][o.e.t.TransportService   ] [Mu6cXVq] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2017-01-20T16:19:37,555][INFO ][o.e.c.s.ClusterService   ] [Mu6cXVq] new_master {Mu6cXVq}{Mu6cXVqFQFOjm1rUhVRYBw}{7hQFAgJ5SlCcC7nUGkyLTg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined) [2017-01-20T16:19:37,594][INFO ][o.e.h.HttpServer         ] [Mu6cXVq] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2017-01-20T16:19:37,595][INFO ][o.e.n.Node               ] [Mu6cXVq] started
[2017-01-20T16:19:37,596][INFO ][o.e.g.GatewayService     ] [Mu6cXVq] recovered [0] indices into cluster_state [2017-01-20T16:20:51,861][INFO ][o.e.n.Node               ] [Mu6cXVq] stopping ...
[2017-01-20T16:20:51,904][INFO ][o.e.n.Node               ] [Mu6cXVq] stopped
[2017-01-20T16:20:51,904][INFO ][o.e.n.Node               ] [Mu6cXVq] closing ...
[2017-01-20T16:20:51,918][INFO ][o.e.n.Node               ] [Mu6cXVq] closed

Where do I start with diagnosing the issue causing ES to stop running?

1 Like

Kept trolling thru logs, and found this in /var/log/elasticsearch/logstash.log:

java.lang.OutOfMemoryError: Java heap space

Read up on the topic of heap space at: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/heap-size.html
and then modified /etc/elasticsearch/jvm.options to increase the heap size thusly (I'm running on a 32GB RAM system):

-Xms12g
-Xmx12g

Restarted the elasticsearch service, and after a bit, Kibana started working again :slight_smile:

Let the journey of learning continue...

3 Likes

You need to reduce your shard count, it's going to be causing issues taking so much resources.

What sort of data is this in your cluster?

Syslog data

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.