Sample installation runs out of disk, indices readonly

A single host installation is running out of disk well before I would expect given the amount of data stored in clusters. However I am struggling to understand the contact points to use to diagnose this with ECE or eventually to control it.

GET /_cluster/allocation/explain tells "can_allocate" : "no, allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes
GET _cat/shards shows only some indices running to low hundreds of MB - totalling probably a couple of GB at the most
GET _cat/allocation shows disk used 18.4GB avail 1.5GB

Using df and du on the underlying ECE node on AWS suggests, the disk is filling.
Surprisingly I found the proxyv2 directory was comparable in size to the allocator one - this looked suspicious.
Drilling down deeper I found that the majority of the proxyv2 usage was uncompressed logs which were huge.
So tactical question - how can I change the logging policy for proxy nodes to avoid storing these relatively useless logs or at the very least, compress them?
And strategic question, what is best practice for storage management within ECE?

I will delete some of these logs to get the ECE off its knees but would like a better solution.

Is this running 2.4? Can you give any more details on the logs you were seeing in proxyv2?

The short answer is that we hardwire log rotation to what we believe is a sufficiently aggressive amount (I think each log file should be 100MB and there is at most 10 file per service)

But it doesn't sound like it's working in your case - did I read correctly that you saw 8GB of proxyv2 logs?! (allocator and proxy about the same, using up 18GB in total?)



yes these are 2.4.1 very recently installed. Sorry I didn't keep the logs or even a snapshot of the sizes but they were huge, certainly bigger than 100MB. They were numbered, all the same size, and roughly one per day.

I've been keeping an eye on the rebuilt platform and it isn't showing the same problem.

Apparently the proxy v2 logging has the following max sizes:

by default 14 files of 500Mb [...]
That's for requests, there's 4 more 500Mb files (transport, errors, etc) by default

So you can indeed get up to 9GB of data. We're looking into either reducing that or making it configurable, but until then it will be necessary to ensure enough disk space to be safe - sorry for the inconvenience