Why does Elasticsearch's default settings for logging via log4j2 retain such large amounts logs?

I am curious about the logging settings for Elasticsearch, (I am running 7.17.3) as they appear much larger then what I typically expect for application logging. I have run into issues where the logs have filled my logging partition as they are allowed to grow so large.

I get that it duplicates the logs to have them in both JSON and human readable. But why retain 2GB of compressed logs (in each case) is that not a little excessive? There are a number of other logs (deprecation, slowlog* (I understand these are effectively off until you set the thresholds)) that combined can take up to 30GB of space at capacity according to the log4j 2 config for Elasticsearch. And then there are the gc.logs (set with the jvm.options file) that can retain another 2 GB.

It looks like default settings could potential retain 36 GB of logs including the gc. I am wondering why this is the case and if it is necessary?

From experience I can say that some incidents may require looking back over several weeks of logs to analyse. Nodes in production clusters usually have TBs of storage, so spending a few GB on possibly-vital logs seems worth the investment.

Thanks for your reply.
I am still pretty new to Elasticsearch and learning what the cluster in production will require, things like the quantity of logs it will produce daily and which logs will be most useful for my case. I suppose that 2GB each for the .log, _server.json and the gc.logs at 6GB could be reasonable depending on the amount of logging taking place, which I am still trying to understand.
Most other log settings I've come across are only a few MB and retain weeks worth of information, typically I've seen them set to retain logs based on time/date and end up only storing a few MB before rotating due to the time condition. However when something is very wrong the log gets packed and grows quite large giving an indication there is a problem, which is why I am asking about ES default settings as they are set to be quite large by default compared to other log setting I've seen. Also, there are some cases I need to look at with space restrictions (which I know is not the ideal).

I still have questions on the defaults for deprecation and slowlog logs, with 5GB for each log and the JSON duplication, amounting to 30GB, taking up most of the space Elasticsearch allocates for logging. From what I gather as the purpose of these particular logs (and I may be mistaken), I do not understand why their settings allow them to grow so large. If you are trying to catch deprecated commands or bottlenecks which I would assume you would be actively looking for when examining these logs (and by setting thresholds for the slowlogs), do they really need to be that large would a 100 MB or less not suffice for this? Why was 1GB of active logs and 4 GB retained chosen as values for these?

I'm not sure a huge amount of science went into the defaults here - a few GB just doesn't seem like very much data. Most users are not so diligent at watching for problems in their logs, so it tends to be worth erring on the side of retention by default. Feel free to reduce the retained amount if you need.

Thank you for your help and the discussion, it has been very helpful.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.