Elasticsearch cluster corrupted

I deleted my indices on account it filled up my hard drive and I couldn't login to Graylog. Now my Elasticsearch cluster is corrupted. Is there a way to remedy Elasticsearch to restore functionality to Graylog without a reinstall?

I posted the issue with the details and logs on the Graylog forums and was told to ask here for assistance.

Cheers

You will need to delete everything in the data directory, I don't know where that is as I don't know how graylog installs Elasticsearch. It looks like that would be /var/lib/elasticsearch, but you may want to check.

Next time, it's best to use the APIs instead of removing files from the filesystem :slight_smile:

Howdy,

I went ahead an deleted the files in /var/lib/elasticsearch/. Rebooted the machine still in the same situation. I definitely would have deleted them from the API, but the storage was maxed out and refused to start. Catch 22, so to speak. Do you have any other advice to try to get this thing to boot? Also, once it's back up and running from a rebuild or I'm able to save the server, what's the best practice to recycle the logs, so the server doesn't crash again?

What's the actual error?

Does Graylog not provide retention tools? If not, check out Elasticsearch Curator.

The other errors are on the graylog forums: https://community.graylog.org/t/elasticsearch-refuses-to-start/3430

The error for checking the status of Elasticsearch:

Blockquote
systemctl status elasticsearch.service
â elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2017-12-08 12:54:23 PST; 24s ago
Docs: http://www.elastic.co
Process: 921 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -Edefault.path.logs=${LOG_DIR} -Edefault.path.data=${DATA_DIR} -Edefault.path.conf=${CONF_DIR} (code=exited, status=1/FAILURE)
Process: 914 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0/SUCCESS)
Main PID: 921 (code=exited, status=1/FAILURE)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.node.Interna...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.cli.Environm...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.cli.Environm...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.cli.Command....)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.cli.Command....)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.bootstrap.El...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.bootstrap.El...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM systemd[1]: elasticsearch.service: main process exit...RE
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM systemd[1]: Unit elasticsearch.service entered faile...e.
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM systemd[1]: elasticsearch.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

Can you show the logs from Elasticsearch?

This is from the other day when the storage filled up on 12/6/2017, in /var/log/elasticsearch/

graylog-2017-12-06.log

Blockquote
GNU nano 2.3.1 File: graylog-2017-12-06.log
[2017-12-06T10:42:26,712][INFO ][o.e.n.Node ] [] initializing ...
[2017-12-06T10:42:26,883][INFO ][o.e.e.NodeEnvironment ] [qYyUHAp] using [1] data paths, mounts [[/ (rootfs)]], n$
[2017-12-06T10:42:26,883][INFO ][o.e.e.NodeEnvironment ] [qYyUHAp] heap size [1.9gb], compressed ordinary object $
[2017-12-06T10:42:27,367][INFO ][o.e.n.Node ] node name [qYyUHAp] derived from node ID [qYyUHApAREySM8$
[2017-12-06T10:42:27,367][INFO ][o.e.n.Node ] version[5.6.3], pid[928], build[1a2f265/2017-10-06T20:33$
[2017-12-06T10:42:27,368][INFO ][o.e.n.Node ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, $
[2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [aggs-matrix-stats]
[2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [ingest-common]
[2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [lang-expression]
[2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [lang-groovy]
[2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [lang-mustache]
[2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [lang-painless]
[2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [parent-join]
[2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [percolator]
[2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [reindex]
[2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [transport-netty3]
[2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService ] [qYyUHAp] loaded module [transport-netty4]
[2017-12-06T10:42:29,855][INFO ][o.e.p.PluginsService ] [qYyUHAp] no plugins loaded
[2017-12-06T10:42:31,396][INFO ][o.e.d.DiscoveryModule ] [qYyUHAp] using discovery type [zen]
[2017-12-06T10:42:32,034][INFO ][o.e.n.Node ] initialized
[2017-12-06T10:42:32,034][INFO ][o.e.n.Node ] [qYyUHAp] starting ...
[2017-12-06T10:42:52,656][INFO ][o.e.t.TransportService ] [qYyUHAp] publish_address {127.0.0.1:9300}, bound_addres$
[2017-12-06T10:42:55,843][INFO ][o.e.c.s.ClusterService ] [qYyUHAp] new_master {qYyUHAp}{qYyUHApAREySM8TuBTwSzg}{q$
[2017-12-06T10:42:55,873][INFO ][o.e.h.n.Netty4HttpServerTransport] [qYyUHAp] publish_address {127.0.0.1:9200}, boun$
[2017-12-06T10:42:55,874][INFO ][o.e.n.Node ] [qYyUHAp] started
[2017-12-06T10:42:56,246][INFO ][o.e.g.GatewayService ] [qYyUHAp] recovered [4] indices into cluster_state
[2017-12-06T10:43:26,037][WARN ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] high disk watermark [90%] exceeded on [qY$
[2017-12-06T10:43:26,066][INFO ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] rerouting shards: [high disk watermark ex$
[2017-12-06T10:43:34,563][INFO ][o.e.m.j.JvmGcMonitorService] [qYyUHAp] [gc][62] overhead, spent [377ms] collecting $
[2017-12-06T10:43:48,695][INFO ][o.e.m.j.JvmGcMonitorService] [qYyUHAp] [gc][76] overhead, spent [327ms] collecting $
[2017-12-06T10:43:56,237][WARN ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] high disk watermark [90%] exceeded on [qY$

Graylog.log on 12/7/2017

Blockquote
2017-12-07T00:00:07,911][WARN ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] high disk watermark [90%] exceeded on [qY$
[2017-12-07T00:00:07,958][INFO ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] rerouting shards: [high disk watermark ex$
[2017-12-07T00:00:37,983][WARN ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] high disk watermark [90%] exceeded on [qY$
[2017-12-07T00:01:08,007][WARN ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] high disk watermark [90%] exceeded on [qY$

Maybe try increasing that setting - Disk-based Shard Allocation | Elasticsearch Reference [6.0] | Elastic

To backtrack a little bit. I only have one server running vm 30gb of storage. After 2 weeks the disk was filled. I ran du -h --max-depth=1 to track the culprit, which lead me to the indices that were deleted because I couldn't eve get the API to launch. After I deleted the files inside the directory suggested above /var/lib/elasticsearch/. I rebooted all services and even the server and Elasticsearch still won't start.

When I run curl -X GET http://localhost:9200. It will spit out:

Blockquote
curl -XPOST ‘localhost:9200/_cluster/reroute’ -d ‘{“commands”:[{“allocate”:{“index”:“graylog2_6”,“shard”:1,“node”:“gl-es01-esgl2”,“allow_primary”:true}}]}’
Access Denied
Access Denied (authentication_failed)
Your credentials could not be authenticated: "Credentials are missing.". You will not be permitted access until your credentials can be verified.
This is typically caused by an incorrect username and/or password, but could also be caused by network problems.
Client Name:
Client IP Address: 192.168.1.10
Server Name: Host Name
For assistance, please open a Ticket incident at <A href='Company-Tech-Support

Did you rm -rf /var/lib/elasticsearch/, or something else?

1 Like

Yes sir, it was rm -rf /var/lib/elasticsearch/.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.