Remove docs.deleted

joao · October 2, 2017, 7:01pm

Hi guys, good afternoon.

So, I've been trying to remove the docs deleted for some index, but I didn't have success.

Could you help me?

This is my index:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open logstash-2017.09.21 L-iREuGaRXqWfxTBIit2ag 5 1 40995549 857 18.8gb 18.8gb
yellow open logstash-2017.09.22 dnHs2S7rSs6rpqmUJbe6sw 5 1 39950568 380268 17.4gb 17.4gb

I followed this link:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html

I used this command:
POST /logstash-2017.09.21/_forcemerge?only_expunge_deletes=true

theuntergeek · October 2, 2017, 7:14pm

A forceMerge takes time. Since you did not tell it to keep the client open until completion, it's running the merge in the background, and there's no real way to tell when it's done.

A forceMerge like that on a 17g index could take hours to complete, especially if merges are throttled in any way.

theuntergeek · October 2, 2017, 7:17pm

Since you're using time-series indices from Logstash, you should be deleting indices, rather than deleting data from them. If you need to retain some data longer than others, then you should send the data with a longer retention period to a different index name, and then delete the ones with a shorter retention period.

The index management portion of this can be easily handled by Elasticsearch Curator. Splitting your data into different indices would be handled in the output block of your Logstash configuration.

joao · October 2, 2017, 7:42pm

First, thank you very much.
So, I tryied the last 5 days without success.

theuntergeek · October 2, 2017, 8:03pm

Please be more specific. Last 5 days, meaning what, exactly? Did you do the delete_by_query method you were doing first? Deleting indices in Curator is complete and total. Disk space is recovered immediately, because the index is deleted, rather than documents within it.

joao · October 3, 2017, 12:35pm

Good morning,

I didn't use curator, I used delete_by_query. I executed this query some times.

theuntergeek · October 3, 2017, 1:47pm

Same thing as previously mentioned. Deleted documents aren't actually deleted until a segment merge occurs. The delete_by_query only flags them as deleted. The segment merge removes documents flagged for deletion. Segment merges can be throttled by the cluster, and therefore take a considerable amount of time to complete.

I highly recommend not using delete_by_query, and using something like Curator so that entire indices can be deleted, not just a percentage of the documents in an index. delete_by_query is not a disk space management solution, especially with time-series data.

system · October 31, 2017, 1:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
After expunge command still docs.deleted is not reducing Elasticsearch	4	115	April 9, 2024
How to delete docs.deleted from ELK? Elasticsearch	2	216	August 14, 2023
How to clear docs.deleted from cat-indices page of elasticsearch? Elasticsearch	6	6795	December 5, 2017
Merging the index with lower max_number_segement causing increasing index size Elasticsearch	9	427	February 7, 2022
Best way to remove docs from index Elasticsearch	2	117	April 24, 2024

Remove docs.deleted

Related topics