Hi,
In one of our production clusters we saw that there is a big difference
between df and du disk size reports. df shows 100% and du shows as 70%. This
remains as it is while cluster is running. Do you have any ideas about the
reason of this problem?
df shows disk space usage on your volume, du disk used by the list
that it can access. The remaining 30% is in files that du is not
seeing. Assuming there are no obvious mistakes, this usually comes
from big files that have been deleted from the file system but that
are still mapped by some process. For instance, maybe you deleted a
big log file, but forgot to restart / reload the server that created
this log file.
Hi,
In one of our production clusters we saw that there is a big difference
between df and du disk size reports. df shows 100% and du shows as 70%. This
remains as it is while cluster is running. Do you have any ideas about the
reason of this problem?
That is the reason why I report this situation. I am trying to investigate
whether this may caused by ES or not. We come across this situation twice in
our production ES cluster. ES gave an exception about disk space previously.
When we checked file system we saw that df and du give different values. We
restarted servers and our data is corrupted because of this. We restarted
them and repopulated whole data again and after nearly two months later same
situation appeared again. We use ES version 0.15.2. When I searched about
this problem, I saw that other lucene based products face with same problem
if any IndexReader is not closed. may this be a problem in ES too?
df shows disk space usage on your volume, du disk used by the list
that it can access. The remaining 30% is in files that du is not
seeing. Assuming there are no obvious mistakes, this usually comes
from big files that have been deleted from the file system but that
are still mapped by some process. For instance, maybe you deleted a
big log file, but forgot to restart / reload the server that created
this log file.
Hi,
In one of our production clusters we saw that there is a big difference
between df and du disk size reports. df shows 100% and du shows as 70%.
This
remains as it is while cluster is running. Do you have any ideas about
the
reason of this problem?
You can use a command like "lsof | grep deleted" to get a listing of
the deleted files which have open file handles along with the pid that
is holding them. At least you'll know the names of the files and can
validate that they are indeed being held open by elasticsearch. If
you provide the filenames it may at least help narrow down the cause
of the issue.
That is the reason why I report this situation. I am trying to investigate
whether this may caused by ES or not. We come across this situation twice in
our production ES cluster. ES gave an exception about disk space previously.
When we checked file system we saw that df and du give different values. We
restarted servers and our data is corrupted because of this. We restarted
them and repopulated whole data again and after nearly two months later same
situation appeared again. We use ES version 0.15.2. When I searched about
this problem, I saw that other lucene based products face with same problem
if any IndexReader is not closed. may this be a problem in ES too?
You can use a command like "lsof | grep deleted" to get a listing of
the deleted files which have open file handles along with the pid that
is holding them. At least you'll know the names of the files and can
validate that they are indeed being held open by elasticsearch. If
you provide the filenames it may at least help narrow down the cause
of the issue.
That is the reason why I report this situation. I am trying to investigate
whether this may caused by ES or not. We come across this situation twice in
our production ES cluster. ES gave an exception about disk space previously.
When we checked file system we saw that df and du give different values. We
restarted servers and our data is corrupted because of this. We restarted
them and repopulated whole data again and after nearly two months later same
situation appeared again. We use ES version 0.15.2. When I searched about
this problem, I saw that other lucene based products face with same problem
if any IndexReader is not closed. may this be a problem in ES too?
On Monday, April 25, 2011 at 11:34 PM, Yeroc wrote:
You can use a command like "lsof | grep deleted" to get a listing of
the deleted files which have open file handles along with the pid that
is holding them. At least you'll know the names of the files and can
validate that they are indeed being held open by elasticsearch. If
you provide the filenames it may at least help narrow down the cause
of the issue.
That is the reason why I report this situation. I am trying to investigate
whether this may caused by ES or not. We come across this situation twice
in
our production ES cluster. ES gave an exception about disk space
previously.
When we checked file system we saw that df and du give different values. We
restarted servers and our data is corrupted because of this. We restarted
them and repopulated whole data again and after nearly two months later
same
situation appeared again. We use ES version 0.15.2. When I searched about
this problem, I saw that other lucene based products face with same problem
if any IndexReader is not closed. may this be a problem in ES too?
Memory used by all the processes which are open will be added in df not in du. you can run lsof and count the momory used by open processes and add it in the output of du , you will get the same result as given in df .
For example, if df shows the file system which is mounted on /tmp consuming 100% memory then try to run lsof | grep /tmp and count the memory used . The result of df and du will match
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.