Sample installation runs out of disk, indices readonly

mesiasc · November 7, 2019, 11:09am

A single host installation is running out of disk well before I would expect given the amount of data stored in clusters. However I am struggling to understand the contact points to use to diagnose this with ECE or eventually to control it.

GET /_cluster/allocation/explain tells "can_allocate" : "no, allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes
GET _cat/shards shows only some indices running to low hundreds of MB - totalling probably a couple of GB at the most
GET _cat/allocation shows disk used 18.4GB avail 1.5GB

Using df and du on the underlying ECE node on AWS suggests, the disk is filling.
Surprisingly I found the proxyv2 directory was comparable in size to the allocator one - this looked suspicious.
Drilling down deeper I found that the majority of the proxyv2 usage was uncompressed logs which were huge.
So tactical question - how can I change the logging policy for proxy nodes to avoid storing these relatively useless logs or at the very least, compress them?
And strategic question, what is best practice for storage management within ECE?

I will delete some of these logs to get the ECE off its knees but would like a better solution.

Alex_Piggott · November 7, 2019, 9:48pm

Is this running 2.4? Can you give any more details on the logs you were seeing in proxyv2?

The short answer is that we hardwire log rotation to what we believe is a sufficiently aggressive amount (I think each log file should be 100MB and there is at most 10 file per service)

But it doesn't sound like it's working in your case - did I read correctly that you saw 8GB of proxyv2 logs?! (allocator and proxy about the same, using up 18GB in total?)

Alex

mesiasc · November 8, 2019, 9:57am

Alex,

yes these are 2.4.1 very recently installed. Sorry I didn't keep the logs or even a snapshot of the sizes but they were huge, certainly bigger than 100MB. They were numbered, all the same size, and roughly one per day.

I've been keeping an eye on the rebuilt platform and it isn't showing the same problem.

Alex_Piggott · November 11, 2019, 7:11pm

Apparently the proxy v2 logging has the following max sizes:

by default 14 files of 500Mb [...]
That's for requests, there's 4 more 500Mb files (transport, errors, etc) by default

So you can indeed get up to 9GB of data. We're looking into either reducing that or making it configurable, but until then it will be necessary to ensure enough disk space to be safe - sorry for the inconvenience

system · November 25, 2019, 7:11pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dec 17th, 2017: [EN][ElasticCloudEnterprise] ECE from the trenches Advent Calendar	1	1948	August 23, 2018
Elastic Agent filling up disk space with logs, disaster Endpoint Security	7	2503	July 26, 2021
Not enough capacity... why? Elastic Cloud Enterprise (ECE)	4	1766	August 8, 2017
Need advice on ECE Hardware Elastic Cloud Enterprise (ECE)	4	835	February 15, 2019
ECE XFS Quota Question Elastic Cloud Enterprise (ECE)	18	2693	February 7, 2019

Sample installation runs out of disk, indices readonly

Related topics