Disc space is not automatically freed when you delete documents from an index. The reason for this is that the index segments, the building blocks of shards, are immutable Lucene indices. This means that a document stored in a segment is never physically updated or erased, just marked as deleted if you execute an update or a delete on it in Elasticsearch.
Because of this, Elasticsearch will perform segment merges in an index from time to time, typically when there are very many small segments in the index or the number of documents marked as deleted is a large percentage of the total number of stored documents ("large" may be 20-30%). When a merge takes place, Elasticsearch will read two or more smaller segments and write them to a new larger one. In the process Elasticsearch will skip all those documents marked as deleted, so that once the new segment is complete and the smaller originals removed, you will have saved disc space corresponding to the size of the deleted documents.
If you need to reclaim disc space and don't want to wait for Elasticsearch to do a merge you have two options:
-
Reindex to a new index and simply delete the old index with all the deleted documents.
-
Run a forcemerge on the index.
The first option may not always be feasible, since the reindex may take time for large indices, but it guarantees that all deleted documents are removed. The second option allows you to force Elasticsearch to perform a merge on a specific index by adding only_expunge_deletes=true as a parameter. Like this:
curl -XPOST "http://esnode01:9200/my_large_index/_forcemerge?only_expunge_deletes=true"
{"_shards":{"total":12,"successful":12,"failed":0}}
Here I've done a forcemerge on "my_large_index" (which contains 6 primary and 6 replica shards).
Note that while the reindexing will remove all deleted documents from an index, the forcemerge will remove most but usually not all because some index segments may be so large it won't merge them (merging is always done on the smaller segments first).