I have an index that inflates more until the node crashes with a full disk error.
For now, I delete the index directly from the VM because Elastic is unhealthy when it happens.
It is helpful for a short period, and after some time the index inflates and is out of disk space.
The details:
When I run the count command the response is 3,571,753 documents:
Those two things would explain the index growing and the number of deleted documents.
When you update a document on an index, elasticsearch will delete this document, create a new one and mark the old document as deleted on the segment where it is stored, the documents marked as deleted will take up disk space for a period of time.
Elasticearch periodically merge smaller segments into large ones, and when this merge happens the documents marked as deleted are removed and the disk space used by them is freed.
You do not have control when Elasticsearch will merge segments, there is a force merge API to force elasticsearch to merge the segments of an index but it is not recommended to use this API on indices that are still being written.
I'm not sure that there is much you can do besides revising your update strategy.
Hello @leandrojmp, thank you.
We clone some data of a RDBS to Elasticsearch every 20 minutes and update some data when a user is modified in the Frontend app. And worst of all we have a heavy logstash pipeline that updates some nested documents.
So I understand we haven't something to do?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.