The other day, my servers all ran out of disk space at roughly the same time. That corrupted three indexes. One index I was able to recover, and curl 'n7-z01-0a2a29a5.iaas.starwave.com:9200/_cat/indices?v'
it is green. However, two indexes are red, and I can't see to get rid of them.
I tried:
MGMTPROD\silvj170@n7mmadm02 ~]$ curl -XDELETE http://n7-z01-0a2a2723.iaas.starwave.com:9200/fnd-logstash-2015.06.01
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}[MGMTPROD\silvj170@n7mmadm02 ~]$
I tried
curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true'
{"_shards":{"total":4760,"successful":4678,"failed":2,"failures":[{"index":"fnd-logstash-2015.05.27","shard":8,"reason":"BroadcastShardOperationFailedException[[fnd-logstash-2015.05.27][8] ]; nested: RemoteTransportException[[Solarr][inet[/10.42.41.167:9300]][indices:admin/optimize[s]]]; nested: FlushNotAllowedEngineException[[fnd-logstash-2015.05.27][8] recovery is in progress, flush [COMMIT_TRANSLOG] is not allowed]; "},{"index":"fnd-logstash-2015.05.27","shard":9,"reason":"BroadcastShardOperationFailedException[[fnd-logstash-2015.05.27][9] ]; nested: RemoteTransportException[[Turner D. Century][inet[/10.42.41.164:9300]][indices:admin/optimize[s]]]; nested: FlushNotAllowedEngineException[[fnd-logstash-2015.05.27][9] recovery is in progress, flush [COMMIT_TRANSLOG] is not allowed]; "}]}
I tried stopping all of the daemons at the same time, and deleting the index files. Then I restarted the daemons and the directories and some (but not all) of the files came back into existence!
[
root@n7-z01-0a2a29a5 ~]# ls -l /data/apps/prod/elasticsearch/data/semfs_fnd_es/nodes/0/indices/fnd-logstash-2015.05.29/
total 4
drwxr-xr-x 2 elastic elastic 4096 Jun 3 14:53 _state
[root@n7-z01-0a2a29a5 ~]# ls -l /data/apps/prod/elasticsearch/data/semfs_fnd_es/nodes/0/indices/fnd-logstash-2015.05.29/_state
total 4
-rw-r--r-- 1 elastic elastic 1044 Jun 3 14:53 state-6
[root@n7-z01-0a2a29a5 ~]#
Because these two indexes are red, the whole cluster is red. I ran tcpdump on all of the nodes in the cluster, and I see traffic moving very fast on port 9300. I also see lots of ESTABLISHED connections on port 9300.
This used to work before my indexes got corrupted, so I think my configuration is okay. What do I do next?
Many thanks,
Jeff