Hi all, ELK n00b here, with a problem that I need someone more knowledgable's assistance in solving...
I have a single-node v5.4.1 ELK cluster running on Ubuntu 16.04.2. I went to use Kibana after a long time, and Kibana was showing "Red" status:
Ran a GET on health status, saw this:
root@logstash01:/var/log# curl -XGET 'http://localhost:9200/_cluster/health?pretty'
{
"cluster_name" : "logstash",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 1,
"unassigned_shards" : 2921,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 2,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 1923,
"active_shards_percent_as_number" : 0.0
}
In a few minutes, it went to this:
root@logstash01:/var/log# curl -XGET 'http://localhost:9200/_cluster/health?pretty'
curl: (7) Failed to connect to localhost port 9200: Connection refused
So I checked the elasticsearch service via systemctl, and saw:
root@logstash01:/var/log# systemctl status elasticsearch
* elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2017-07-17 13:18:12 EDT; 54min ago
Docs: http://www.elastic.co
Process: 5308 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -E
Process: 5304 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0 Main PID: 5308 (code=exited, status=127)
Jul 17 13:07:13 logstash01 systemd[1]: Starting Elasticsearch...
Jul 17 13:07:13 logstash01 systemd[1]: Started Elasticsearch.
Jul 17 13:18:12 logstash01 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=127/n/a Jul 17 13:18:12 logstash01 systemd[1]: elasticsearch.service: Unit entered failed state.
Jul 17 13:18:12 logstash01 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
And this in /var/log/elasticsearch/elasticsearch.log:
root@logstash01:/var/log# tail -n24 /var/log/elasticsearch/elasticsearch.log
[2017-01-20T16:19:30,600][INFO ][o.e.n.Node ] node name [Mu6cXVq] derived from node ID [Mu6cXVqFQFOjm1rUhVRYBw]; set [node.name] to override
[2017-01-20T16:19:30,603][INFO ][o.e.n.Node ] version[5.1.2], pid[9797], build[c8c4c16/2017-01-11T20:18:39.146Z], OS[Linux/4.4.0-59-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b14]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [aggs-matrix-stats]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [ingest-common]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [lang-expression]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [lang-groovy]
[2017-01-20T16:19:31,796][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [lang-mustache]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [lang-painless]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [percolator]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [reindex]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [transport-netty3]
[2017-01-20T16:19:31,797][INFO ][o.e.p.PluginsService ] [Mu6cXVq] loaded module [transport-netty4]
[2017-01-20T16:19:31,798][INFO ][o.e.p.PluginsService ] [Mu6cXVq] no plugins loaded
[2017-01-20T16:19:34,200][INFO ][o.e.n.Node ] initialized
[2017-01-20T16:19:34,201][INFO ][o.e.n.Node ] [Mu6cXVq] starting ...
[2017-01-20T16:19:34,425][INFO ][o.e.t.TransportService ] [Mu6cXVq] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2017-01-20T16:19:37,555][INFO ][o.e.c.s.ClusterService ] [Mu6cXVq] new_master {Mu6cXVq}{Mu6cXVqFQFOjm1rUhVRYBw}{7hQFAgJ5SlCcC7nUGkyLTg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined) [2017-01-20T16:19:37,594][INFO ][o.e.h.HttpServer ] [Mu6cXVq] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2017-01-20T16:19:37,595][INFO ][o.e.n.Node ] [Mu6cXVq] started
[2017-01-20T16:19:37,596][INFO ][o.e.g.GatewayService ] [Mu6cXVq] recovered [0] indices into cluster_state [2017-01-20T16:20:51,861][INFO ][o.e.n.Node ] [Mu6cXVq] stopping ...
[2017-01-20T16:20:51,904][INFO ][o.e.n.Node ] [Mu6cXVq] stopped
[2017-01-20T16:20:51,904][INFO ][o.e.n.Node ] [Mu6cXVq] closing ...
[2017-01-20T16:20:51,918][INFO ][o.e.n.Node ] [Mu6cXVq] closed
Where do I start with diagnosing the issue causing ES to stop running?