I have a problem: I run elasticsearch 6.2 on CentOS7 machine, and I see that my disk is filling up (by running "df" command).
When I check with the "du" command I don't see anything that could account for the disk usage.
When I run:
lsof | grep '(deleted)'
I see alot of elasticsearch gziped logs that elastic has deleted, but since elastic is still on - they are not removed from the file system, and they fill up the disk.
When stopping and restarting elasticsearch the files are removed.
My question is:
Considering I need elasticsearch to be on 24/7 in production, how can I make sure that elastic deletes these files from the file system?
I don't think this should happen with the default logging configuration, although it is possible with some configurations. Can you share your log4j2.properties?
Can you show us the output from this? I tried to reproduce what you describe by configuring an Elasticsearch node to roll over its logs every second and to delete them shortly after and I do not see anything leaking:
The output you've shown is really just two file descriptors (1w and 2w) repeated once for each thread in Elasticsearch.
Are you using anything outside of Elasticsearch to manage your log rotation? I don't think Elasticsearch should ever open a file called what this one was called, so I think it might have been renamed out from underneath it.
I will check, but should it matter in the issue of too many open files that clog up my disk?
And doesn't this setting I shared is responsible for the log name format?
No, the Too many open files error is about running out of file descriptors, but as I said there's only two.
The line you shared is part of the rotation config for the slowlog. But I don't think that helps. What I think has happened is that something outside Elasticsearch has renamed the log file and then deleted it while Elasticsearch was still writing to it. This sounds like something an external log rotation process would do (e.g.logrotate).
I couldn't find any other program that mess with elastic log configuration.
The problem pressist: alot of deleted docs, and they fill my disk.
My problem is OOM and too many open files, and I can't delete all the files without stopping elastic entirly.
Is there a better way to handle this issue?
Your original post was about excessive disk usage. This is the first time you've mentioned an OOM. Also are you also seeing a Too many open files error? Can you share the full error message and stack trace? Can you use lsof to identify all the open files? So far we have only seen 2 open file descriptors.
Can you also share the output of GET _cluster/health?
Sorry, let me clarify:
I'm having memory trouble in my machine, programs are crushing due to memory shortage and too many open files errors.
The only think I can think of that is responsible for this is elasticsearch, which holds alot of open files at any given moment (~30-50K) and the disk is filling with deleted logs that don't get removed unless I stop elastic.
I want to use elastic in production, but if I don't understand how to stop elastic from filling my disk and opening so many files, how can I?
I do not think that open files are a cause of OOMs, and 30-50k open files isn't very many, particularly if you're looking at lsof which double-counts file descriptors as we've seen above. Let's see the output from GET _cluster/health as I asked above.
I will do that, but I'm troubled by the fack that elastic holds alot of memory by holding on to delted files, and the space the thus files are occupying doesn't get released until elastic is shut down.
Is there a way to combat it?
You haven't shown any evidence supporting this claim yet. So far you've shown Elasticsearch holding onto a single deleted file (repeated many times because that's how lsof works). From the filename, it looks like something other than Elasticsearch deleted it, so it's expected that it would remain open (because that's how Unix filesystems work). We also haven't seen anything demonstrating that this is related to the OOMs that you're reporting. I think you are jumping to conclusions at this point.
There is insufficient memory for the Java Runtime Environment to continue.
Cannot create GC thread. Out of system resources.
An error report file with more information is saved as:
/home/---/hs_err_pid31784.log
Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create new native thread
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.