In an elasticsearch 5.x cluster, the team is using day-wise indices and deleting indices that are older than 150 days. However, GET _cat/indices shows that document deletions are happening for each of the day-wise indices.
GET _cat/indices?v&h=health,index,pri,rep,docs.count,docs.deleted,store.size,pri.store.size&s=pri.store.size:desc
health index pri rep docs.count docs.deleted store.size pri.store.size
green test-2018.11.06 5 1 15290978 438 34gb 17gb
green test-2018.11.11 5 1 15392110 76 33.8gb 16.9gb
green test-2018.11.10 5 1 15328574 76 33.7gb 16.8gb
green test-2018.11.09 5 1 15320059 44 33.6gb 16.8gb
green test-2018.11.08 5 1 15143309 126 33.3gb 16.6gb
green test-2018.10.03 5 1 15066421 11 33.3gb 16.6gb
green test-2018.10.26 5 1 15032894 45 33.2gb 16.6gb
green test-2018.11.07 5 1 15021818 60 33.1gb 16.5gb
green test-2018.10.17 5 1 14888749 21 33gb 16.5gb
green test-2018.10.10 5 1 14871967 69 33gb 16.5gb
Chances are the audit logs may not log everything. How can I know the source/cause that triggers the deletion of these indices?
Able to reproduce it now @Christian_Dahlqvist. What I did was instead of just 1 document, indexed 1000 documents.
And then updated 300 documents. The docs.deleted showed 20. Then updated 100 more documents. docs.deleted showed 120.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open test_index jnzP2XyaSAy8y0dg0QOq0A 5 0 1000 120 338.8kb 338.8kb
Could you kindly explain:
Why indexing more documents, I could reproduce it?
After indexing 1000 documents:
a. When I updated just 3 docs, docs.deleted showed 2 (id 1 to 3)
b. When I updated 300 docs, docs.deleted showed 20 (id 1 to 300)
c. When I updated 100 docs, docs.deleted showed 120(id 300 to 400)
I believe it depends on how segments are merged in the background. Once deleted documents have been merged out as segments are merged they should no longer show up in statistics.
Thanks. That pretty much explains. But how come that I could only reproduce it after I added sizeable no of docs (1000 in this case). Why I couldn't reproduce it with just 1 doc and updating that one doc 100,1000, 3000 times also didn't show any value in docs.deleted? Can you shed some light on this?
@Christian_Dahlqvist - an update. Turns out that for the scenario, there are NO UPDATES happening. This is metric data thus no updates. So why does docs.deleted count show non-zero? Due to segment merging and segment deletions?
If that's so, then for another ES 5.5.x cluster, I've day-wise indices and for all of them, docs.deleted shows 0.
Elasticsearch does not delete documents automatically. If you are specifying the document ID before indexing, it could be updates due to retries at the indexing layer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.