Find the cause of document deletes

sandeepkanabar · November 14, 2018, 9:41pm

In an elasticsearch 5.x cluster, the team is using day-wise indices and deleting indices that are older than 150 days. However, GET _cat/indices shows that document deletions are happening for each of the day-wise indices.

GET _cat/indices?v&h=health,index,pri,rep,docs.count,docs.deleted,store.size,pri.store.size&s=pri.store.size:desc
 
health index          pri rep docs.count docs.deleted store.size pri.store.size
green  test-2018.11.06   5   1   15290978          438       34gb           17gb
green  test-2018.11.11   5   1   15392110           76     33.8gb         16.9gb
green  test-2018.11.10   5   1   15328574           76     33.7gb         16.8gb
green  test-2018.11.09   5   1   15320059           44     33.6gb         16.8gb
green  test-2018.11.08   5   1   15143309          126     33.3gb         16.6gb
green  test-2018.10.03   5   1   15066421           11     33.3gb         16.6gb
green  test-2018.10.26   5   1   15032894           45     33.2gb         16.6gb
green  test-2018.11.07   5   1   15021818           60     33.1gb         16.5gb
green  test-2018.10.17   5   1   14888749           21       33gb         16.5gb
green  test-2018.10.10   5   1   14871967           69       33gb         16.5gb

Chances are the audit logs may not log everything. How can I know the source/cause that triggers the deletion of these indices?

Christian_Dahlqvist · November 14, 2018, 9:54pm

If you are specifying document ids before indexing, updates will show up as deletes.

sandeepkanabar · November 15, 2018, 1:26pm

Hi @Christian_Dahlqvist, can you help me reproduce it?

I created a test index and indexed a document into it using the following:

PUT test_index/test_type/1
{
  "name" : "Elastic"
}

GET _cat/indices/test_index shows

health status index      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test_index jnzP2XyaSAy8y0dg0QOq0A   5   0          1            0      3.9kb          3.9kb

Next, wrote a small script to update it 100 times.

#!/bin/bash
max=100
for i in `seq 1 $max`; do
	curl -u elastic:changeme -XPOST "http://localhost:9200/test_index/test_type/1/_update" -H "Content-Type: application/json" -d"
	{
	  \"doc\": {
	    \"name\" : \"Elastic_${i}\"
	  }
	}"
done

Post the update, ran GET test_index/_search:

{
  "took": 0,
   ....
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "Elastic_100"
        }
      }
    ]
  }
}

GET _cat/indices/test_index?v still doesn't show any non-zero value in docs.deleted

health status index      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test_index jnzP2XyaSAy8y0dg0QOq0A   5   0          1            0        4kb            4kb

What could I be missing? This is local ES 5.x cluster on my macbook

sandeepkanabar · November 15, 2018, 3:02pm

Able to reproduce it now @Christian_Dahlqvist. What I did was instead of just 1 document, indexed 1000 documents.

And then updated 300 documents. The docs.deleted showed 20. Then updated 100 more documents. docs.deleted showed 120.

health status index      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test_index jnzP2XyaSAy8y0dg0QOq0A   5   0       1000          120    338.8kb        338.8kb

Could you kindly explain:

1. Why indexing more documents, I could reproduce it?
1. After indexing 1000 documents:
  a. When I updated just 3 docs, docs.deleted showed 2 (id 1 to 3)
  b. When I updated 300 docs, docs.deleted showed 20 (id 1 to 300)
  c. When I updated 100 docs, docs.deleted showed 120(id 300 to 400)

Christian_Dahlqvist · November 15, 2018, 3:09pm

I believe it depends on how segments are merged in the background. Once deleted documents have been merged out as segments are merged they should no longer show up in statistics.

sandeepkanabar · November 15, 2018, 4:48pm

Thanks. That pretty much explains. But how come that I could only reproduce it after I added sizeable no of docs (1000 in this case). Why I couldn't reproduce it with just 1 doc and updating that one doc 100,1000, 3000 times also didn't show any value in docs.deleted? Can you shed some light on this?

Christian_Dahlqvist · November 15, 2018, 4:50pm

It might be that a small amount of documents cause merging to happen quicker.

sandeepkanabar · November 15, 2018, 4:50pm

Thanks @Christian_Dahlqvist for all your help and support!

sandeepkanabar · November 15, 2018, 5:09pm

As expected, running POST test_index/_forcemerge?max_num_segments=1 resets the docs.deleted to 0.

sandeepkanabar · November 19, 2018, 2:44pm

@Christian_Dahlqvist - an update. Turns out that for the scenario, there are NO UPDATES happening. This is metric data thus no updates. So why does docs.deleted count show non-zero? Due to segment merging and segment deletions?

If that's so, then for another ES 5.5.x cluster, I've day-wise indices and for all of them, docs.deleted shows 0.

Christian_Dahlqvist · November 19, 2018, 3:02pm

Elasticsearch does not delete documents automatically. If you are specifying the document ID before indexing, it could be updates due to retries at the indexing layer.

system · December 17, 2018, 3:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Doc.deleted keeps increasing while indexing Elasticsearch	3	1665	January 10, 2017
How to monitor cause of deletion of documents in index? Elasticsearch	4	299	November 16, 2021
Always keep under docs.deleted after Delete By Query API Elasticsearch	6	529	March 9, 2018
Docs.deleted value not changing Elasticsearch	11	574	February 18, 2021
Max 20 Million Documents? Elasticsearch	3	614	June 12, 2020

Find the cause of document deletes

Related topics