In an elasticsearch 5.x cluster, the team is using day-wise indices and deleting indices that are older than 150 days. However, GET _cat/indices shows that document deletions are happening for each of the day-wise indices.
GET _cat/indices?v&h=health,index,pri,rep,docs.count,docs.deleted,store.size,pri.store.size&s=pri.store.size:desc
health index pri rep docs.count docs.deleted store.size pri.store.size
green test-2018.11.06 5 1 15290978 438 34gb 17gb
green test-2018.11.11 5 1 15392110 76 33.8gb 16.9gb
green test-2018.11.10 5 1 15328574 76 33.7gb 16.8gb
green test-2018.11.09 5 1 15320059 44 33.6gb 16.8gb
green test-2018.11.08 5 1 15143309 126 33.3gb 16.6gb
green test-2018.10.03 5 1 15066421 11 33.3gb 16.6gb
green test-2018.10.26 5 1 15032894 45 33.2gb 16.6gb
green test-2018.11.07 5 1 15021818 60 33.1gb 16.5gb
green test-2018.10.17 5 1 14888749 21 33gb 16.5gb
green test-2018.10.10 5 1 14871967 69 33gb 16.5gb
Chances are the audit logs may not log everything. How can I know the source/cause that triggers the deletion of these indices?
If you are specifying document ids before indexing, updates will show up as deletes.
Hi @Christian_Dahlqvist, can you help me reproduce it?
I created a test index and indexed a document into it using the following:
PUT test_index/test_type/1
{
"name" : "Elastic"
}
GET _cat/indices/test_index shows
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open test_index jnzP2XyaSAy8y0dg0QOq0A 5 0 1 0 3.9kb 3.9kb
Next, wrote a small script to update it 100 times.
#!/bin/bash
max=100
for i in `seq 1 $max`; do
curl -u elastic:changeme -XPOST "http://localhost:9200/test_index/test_type/1/_update" -H "Content-Type: application/json" -d"
{
\"doc\": {
\"name\" : \"Elastic_${i}\"
}
}"
done
Post the update, ran GET test_index/_search:
{
"took": 0,
....
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": 1,
"_source": {
"name": "Elastic_100"
}
}
]
}
}
GET _cat/indices/test_index?v still doesn't show any non-zero value in docs.deleted
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open test_index jnzP2XyaSAy8y0dg0QOq0A 5 0 1 0 4kb 4kb
What could I be missing? This is local ES 5.x cluster on my macbook
Able to reproduce it now @Christian_Dahlqvist. What I did was instead of just 1 document, indexed 1000 documents.
And then updated 300 documents. The docs.deleted showed 20. Then updated 100 more documents. docs.deleted showed 120.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open test_index jnzP2XyaSAy8y0dg0QOq0A 5 0 1000 120 338.8kb 338.8kb
Could you kindly explain:
-
- Why indexing more documents, I could reproduce it?
-
- After indexing 1000 documents:
a. When I updated just 3 docs, docs.deleted showed 2 (id 1 to 3)
b. When I updated 300 docs, docs.deleted showed 20 (id 1 to 300)
c. When I updated 100 docs, docs.deleted showed 120(id 300 to 400)
I believe it depends on how segments are merged in the background. Once deleted documents have been merged out as segments are merged they should no longer show up in statistics.
Thanks. That pretty much explains. But how come that I could only reproduce it after I added sizeable no of docs (1000 in this case). Why I couldn't reproduce it with just 1 doc and updating that one doc 100,1000, 3000 times also didn't show any value in docs.deleted? Can you shed some light on this?
It might be that a small amount of documents cause merging to happen quicker.
Thanks @Christian_Dahlqvist for all your help and support!
As expected, running POST test_index/_forcemerge?max_num_segments=1 resets the docs.deleted to 0.
@Christian_Dahlqvist - an update. Turns out that for the scenario, there are NO UPDATES happening. This is metric data thus no updates. So why does docs.deleted count show non-zero? Due to segment merging and segment deletions?
If that's so, then for another ES 5.5.x cluster, I've day-wise indices and for all of them, docs.deleted shows 0.
Elasticsearch does not delete documents automatically. If you are specifying the document ID before indexing, it could be updates due to retries at the indexing layer.