A single host installation is running out of disk well before I would expect given the amount of data stored in clusters. However I am struggling to understand the contact points to use to diagnose this with ECE or eventually to control it.
GET /_cluster/allocation/explain tells "can_allocate" : "no, allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes
GET _cat/shards shows only some indices running to low hundreds of MB - totalling probably a couple of GB at the most
GET _cat/allocation shows disk used 18.4GB avail 1.5GB
Using df and du on the underlying ECE node on AWS suggests, the disk is filling.
Surprisingly I found the proxyv2 directory was comparable in size to the allocator one - this looked suspicious.
Drilling down deeper I found that the majority of the proxyv2 usage was uncompressed logs which were huge.
So tactical question - how can I change the logging policy for proxy nodes to avoid storing these relatively useless logs or at the very least, compress them?
And strategic question, what is best practice for storage management within ECE?
I will delete some of these logs to get the ECE off its knees but would like a better solution.