Elasticsearch cluster corrupted

dualcore · December 7, 2017, 5:27pm

I deleted my indices on account it filled up my hard drive and I couldn't login to Graylog. Now my Elasticsearch cluster is corrupted. Is there a way to remedy Elasticsearch to restore functionality to Graylog without a reinstall?

I posted the issue with the details and logs on the Graylog forums and was told to ask here for assistance.

Cheers

warkolm · December 7, 2017, 10:31pm

You will need to delete everything in the data directory, I don't know where that is as I don't know how graylog installs Elasticsearch. It looks like that would be /var/lib/elasticsearch, but you may want to check.

Next time, it's best to use the APIs instead of removing files from the filesystem

dualcore · December 8, 2017, 8:57pm

Howdy,

I went ahead an deleted the files in /var/lib/elasticsearch/. Rebooted the machine still in the same situation. I definitely would have deleted them from the API, but the storage was maxed out and refused to start. Catch 22, so to speak. Do you have any other advice to try to get this thing to boot? Also, once it's back up and running from a rebuild or I'm able to save the server, what's the best practice to recycle the logs, so the server doesn't crash again?

warkolm · December 8, 2017, 8:58pm

What's the actual error?

Does Graylog not provide retention tools? If not, check out Elasticsearch Curator.

dualcore · December 8, 2017, 9:01pm

The other errors are on the graylog forums: Elasticsearch refuses to start - Graylog Central (peer support) - Graylog Community

The error for checking the status of Elasticsearch:

Blockquote
systemctl status elasticsearch.service
â elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2017-12-08 12:54:23 PST; 24s ago
Docs: http://www.elastic.co
Process: 921 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -Edefault.path.logs=${LOG_DIR} -Edefault.path.data=${DATA_DIR} -Edefault.path.conf=${CONF_DIR} (code=exited, status=1/FAILURE)
Process: 914 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0/SUCCESS)
Main PID: 921 (code=exited, status=1/FAILURE)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.node.Interna...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.cli.Environm...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.cli.Environm...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.cli.Command....)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.cli.Command....)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.bootstrap.El...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM elasticsearch[921]: at org.elasticsearch.bootstrap.El...)
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM systemd[1]: elasticsearch.service: main process exit...RE
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM systemd[1]: Unit elasticsearch.service entered faile...e.
Dec 08 12:54:23 SY-DMZ-Graylog-CPEM systemd[1]: elasticsearch.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

warkolm · December 8, 2017, 9:01pm

Can you show the logs from Elasticsearch?

dualcore · December 8, 2017, 9:27pm

This is from the other day when the storage filled up on 12/6/2017, in /var/log/elasticsearch/

graylog-2017-12-06.log

Blockquote
GNU nano 2.3.1 File: graylog-201 [2017-12-06T10:42:26,712][INFO ][o.e.n.Node [2017-12-06T10:42:26,883][INFO ][o.e.e.NodeEnvironment [2017-12-06T10:42:26,883][INFO ][o.e.e.NodeEnvironment [2017-12-06T10:42:27,367][INFO ][o.e.n.Node [2017-12-06T10:42:27,367][INFO ][o.e.n.Node [2017-12-06T10:42:27,368][INFO ][o.e.n.Node [2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,853][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,854][INFO ][o.e.p.PluginsService [2017-12-06T10:42:29,855][INFO ][o.e.p.PluginsService [2017-12-06T10:42:31,396][INFO ][o.e.d.DiscoveryModule [2017-12-06T10:42:32,034][INFO ][o.e.n.Node [2017-12-06T10:42:32,034][INFO ][o.e.n.Node [2017-12-06T10:42:52,656][INFO ][o.e.t.TransportService [2017-12-06T10:42:55,843][INFO ][o.e.c.s.ClusterService [2017-12-06T10:42:55,873][INFO ][o.e.h.n.Netty4HttpSe [2017-12-06T10:42:55,874][INFO ][o.e.n.Node [2017-12-06T10:42:56,246][INFO ][o.e.g.GatewayService [2017-12-06T10:43:26,037][WARN ][o.e.c.r.a.DiskThresholdMonitor] [2017-12-06T10:43:26,066][INFO ][o.e.c.r.a.DiskThresholdMonitor] [2017-12-06T10:43:34,563][INFO ][o.e.m.j.JvmGcMonitorService] [2017-12-06T10:43:48,695][INFO ][o.e.m.j.JvmGcMonitorService] [2017-12-06T10:43:56,237][WARN ][o.e.c.r.a.DiskThresholdMonitor] 7-12-06.log
] initializing ...
] [qYyUHAp] using [1] data paths, mounts [[/ (rootfs)]], n$
] [qYyUHAp] heap size [1.9gb], compressed ordinary object $
] node name [qYyUHAp] derived from node ID [qYyUHApAREySM8$
] version[5.6.3], pid[928], build[1a2f265/2017-10-06T20:33$
] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, $
] [qYyUHAp] loaded module [aggs-matrix-stats]
] [qYyUHAp] loaded module [ingest-common]
] [qYyUHAp] loaded module [lang-expression]
] [qYyUHAp] loaded module [lang-groovy]
] [qYyUHAp] loaded module [lang-mustache]
] [qYyUHAp] loaded module [lang-painless]
] [qYyUHAp] loaded module [parent-join]
] [qYyUHAp] loaded module [percolator]
] [qYyUHAp] loaded module [reindex]
] [qYyUHAp] loaded module [transport-netty3]
] [qYyUHAp] loaded module [transport-netty4]
] [qYyUHAp] no plugins loaded
] [qYyUHAp] using discovery type [zen]
] initialized
] [qYyUHAp] starting ...
] [qYyUHAp] publish_address {127.0.0.1:9300}, bound_addres$
] [qYyUHAp] new_master {qYyUHAp}{qYyUHApAREySM8TuBTwSzg}{q$
rverTransport] [qYyUHAp] publish_address {127.0.0.1:9200}, boun$
] [qYyUHAp] started
] [qYyUHAp] recovered [4] indices into cluster_state
[qYyUHAp] high disk watermark [90%] exceeded on [qY$
[qYyUHAp] rerouting shards: [high disk watermark ex$
[qYyUHAp] [gc][62] overhead, spent [377ms] collecting $
[qYyUHAp] [gc][76] overhead, spent [327ms] collecting $
[qYyUHAp] high disk watermark [90%] exceeded on [qY$

Graylog.log on 12/7/2017

Blockquote
2017-12-07T00:00:07,911][WARN ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] high disk watermark [90%] exceeded on [qY$
[2017-12-07T00:00:07,958][INFO ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] rerouting shards: [high disk watermark ex$
[2017-12-07T00:00:37,983][WARN ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] high disk watermark [90%] exceeded on [qY$
[2017-12-07T00:01:08,007][WARN ][o.e.c.r.a.DiskThresholdMonitor] [qYyUHAp] high disk watermark [90%] exceeded on [qY$

warkolm · December 8, 2017, 9:33pm

Maybe try increasing that setting - Disk-based Shard Allocation | Elasticsearch Reference [6.0] | Elastic

dualcore · December 8, 2017, 9:44pm

To backtrack a little bit. I only have one server running vm 30gb of storage. After 2 weeks the disk was filled. I ran du -h --max-depth=1 to track the culprit, which lead me to the indices that were deleted because I couldn't eve get the API to launch. After I deleted the files inside the directory suggested above /var/lib/elasticsearch/. I rebooted all services and even the server and Elasticsearch still won't start.

When I run curl -X GET http://localhost:9200. It will spit out:

Blockquote
curl -XPOST ‘localhost:9200/_cluster/reroute’ -d ‘{“commands”:[{“allocate”:{“index”:“graylog2_6”,“shard”:1,“node”:“gl-es01-esgl2”,“allow_primary”:true}}]}’
Access Denied
Access Denied (authentication_failed)
Your credentials could not be authenticated: "Credentials are missing.". You will not be permitted access until your credentials can be verified.
This is typically caused by an incorrect username and/or password, but could also be caused by network problems.
Client Name:
Client IP Address: 192.168.1.10
Server Name: Host Name
For assistance, please open a Ticket incident at <A href='Company-Tech-Support

warkolm · December 8, 2017, 9:50pm

Did you rm -rf /var/lib/elasticsearch/, or something else?

dualcore · December 8, 2017, 9:55pm

Yes sir, it was rm -rf /var/lib/elasticsearch/.

system · January 5, 2018, 9:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Effect of deleting index after disk space crash Elasticsearch	4	2171	July 6, 2017
Corrupted translog Elasticsearch	18	8371	June 27, 2017
Recover corrupt/not working elasticsearch data Elasticsearch	7	2099	April 10, 2019
Elasticsearch issue Elasticsearch	13	2060	July 6, 2017
Total dataloss due to disk space issues Elasticsearch	8	474	July 6, 2017

Elasticsearch cluster corrupted

Related topics