Free disk space monitoring after deleting records

I deleted a very large number of records via delete_by_query but I did not see an increase in free space.

Is there any operation to make space available?

2 Likes

Disc space is not automatically freed when you delete documents from an index. The reason for this is that the index segments, the building blocks of shards, are immutable Lucene indices. This means that a document stored in a segment is never physically updated or erased, just marked as deleted if you execute an update or a delete on it in Elasticsearch.

Because of this, Elasticsearch will perform segment merges in an index from time to time, typically when there are very many small segments in the index or the number of documents marked as deleted is a large percentage of the total number of stored documents ("large" may be 20-30%). When a merge takes place, Elasticsearch will read two or more smaller segments and write them to a new larger one. In the process Elasticsearch will skip all those documents marked as deleted, so that once the new segment is complete and the smaller originals removed, you will have saved disc space corresponding to the size of the deleted documents.

If you need to reclaim disc space and don't want to wait for Elasticsearch to do a merge you have two options:

  1. Reindex to a new index and simply delete the old index with all the deleted documents.

  2. Run a forcemerge on the index.

The first option may not always be feasible, since the reindex may take time for large indices, but it guarantees that all deleted documents are removed. The second option allows you to force Elasticsearch to perform a merge on a specific index by adding only_expunge_deletes=true as a parameter. Like this:

curl -XPOST "http://esnode01:9200/my_large_index/_forcemerge?only_expunge_deletes=true"
{"_shards":{"total":12,"successful":12,"failed":0}}

Here I've done a forcemerge on "my_large_index" (which contains 6 primary and 6 replica shards).

Note that while the reindexing will remove all deleted documents from an index, the forcemerge will remove most but usually not all because some index segments may be so large it won't merge them (merging is always done on the smaller segments first).

2 Likes

Thanks @Bernt_Rostad for the explanation.
In my case it would be better the second option because if I use the reindex I will probably break my dashboard. Can you recommend any solutions to quickly recreate the dashboard with the new index?
Anyway, I tried to run the command you wrote but I get:

{
   "statusCode": 504,
   "error": "Gateway Time-out",
   "message": "Client request timeout"
}

Should we increase the timeout?

I assume that Index Aliases would solve that for you. Instead of setting up the dashboard directly against the indices, create aliases for each index and use those index aliases. Then, when you need to create a new index, you just have to change the alias to point to the new index once that is ready to replace the old.

Is that from a client application? I'm using curl directly and have run 2-3 hours forcemerge operations without timeouts so this seems odd to me.

I launch the command from Kibana's console, perhaps it depends on this
I will try to use curl directly

Ok

That means you have a timeout issue either in the browser or in Kibana. I'm not sure which or how to avoid the timeout, I only have experience with running forcemerge as a curl command directly in a terminal window and such calls do not time out in hours.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.