So, I've been trying to remove the docs deleted for some index, but I didn't have success.
Could you help me?
This is my index:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open logstash-2017.09.21 L-iREuGaRXqWfxTBIit2ag 5 1 40995549 857 18.8gb 18.8gb
yellow open logstash-2017.09.22 dnHs2S7rSs6rpqmUJbe6sw 5 1 39950568 380268 17.4gb 17.4gb
A forceMerge takes time. Since you did not tell it to keep the client open until completion, it's running the merge in the background, and there's no real way to tell when it's done.
A forceMerge like that on a 17g index could take hours to complete, especially if merges are throttled in any way.
Since you're using time-series indices from Logstash, you should be deleting indices, rather than deleting data from them. If you need to retain some data longer than others, then you should send the data with a longer retention period to a different index name, and then delete the ones with a shorter retention period.
The index management portion of this can be easily handled by Elasticsearch Curator. Splitting your data into different indices would be handled in the output block of your Logstash configuration.
Please be more specific. Last 5 days, meaning what, exactly? Did you do the delete_by_query method you were doing first? Deleting indices in Curator is complete and total. Disk space is recovered immediately, because the index is deleted, rather than documents within it.
Same thing as previously mentioned. Deleted documents aren't actually deleted until a segment merge occurs. The delete_by_query only flags them as deleted. The segment merge removes documents flagged for deletion. Segment merges can be throttled by the cluster, and therefore take a considerable amount of time to complete.
I highly recommend not using delete_by_query, and using something like Curator so that entire indices can be deleted, not just a percentage of the documents in an index. delete_by_query is not a disk space management solution, especially with time-series data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.