Remove docs.deleted

(João Neto) #1

Hi guys, good afternoon.

So, I've been trying to remove the docs deleted for some index, but I didn't have success.

Could you help me?

This is my index:
health status index uuid pri rep docs.count docs.deleted store.size
yellow open logstash-2017.09.21 L-iREuGaRXqWfxTBIit2ag 5 1 40995549 857 18.8gb 18.8gb
yellow open logstash-2017.09.22 dnHs2S7rSs6rpqmUJbe6sw 5 1 39950568 380268 17.4gb 17.4gb

I followed this link:

I used this command:
POST /logstash-2017.09.21/_forcemerge?only_expunge_deletes=true

(Aaron Mildenstein) #2

A forceMerge takes time. Since you did not tell it to keep the client open until completion, it's running the merge in the background, and there's no real way to tell when it's done.

A forceMerge like that on a 17g index could take hours to complete, especially if merges are throttled in any way.

(Aaron Mildenstein) #3

Since you're using time-series indices from Logstash, you should be deleting indices, rather than deleting data from them. If you need to retain some data longer than others, then you should send the data with a longer retention period to a different index name, and then delete the ones with a shorter retention period.

The index management portion of this can be easily handled by Elasticsearch Curator. Splitting your data into different indices would be handled in the output block of your Logstash configuration.

(João Neto) #4

First, thank you very much.
So, I tryied the last 5 days without success. :confused:

(Aaron Mildenstein) #5

Please be more specific. Last 5 days, meaning what, exactly? Did you do the delete_by_query method you were doing first? Deleting indices in Curator is complete and total. Disk space is recovered immediately, because the index is deleted, rather than documents within it.

(João Neto) #6

Good morning,

I didn't use curator, I used delete_by_query. I executed this query some times.

(Aaron Mildenstein) #7

Same thing as previously mentioned. Deleted documents aren't actually deleted until a segment merge occurs. The delete_by_query only flags them as deleted. The segment merge removes documents flagged for deletion. Segment merges can be throttled by the cluster, and therefore take a considerable amount of time to complete.

I highly recommend not using delete_by_query, and using something like Curator so that entire indices can be deleted, not just a percentage of the documents in an index. delete_by_query is not a disk space management solution, especially with time-series data.

(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.