Find the cause of document deletes


(Sandeepkanabar) #1

In an elasticsearch 5.x cluster, the team is using day-wise indices and deleting indices that are older than 150 days. However, GET _cat/indices shows that document deletions are happening for each of the day-wise indices.

GET _cat/indices?v&h=health,index,pri,rep,docs.count,docs.deleted,store.size,pri.store.size&s=pri.store.size:desc
 
health index          pri rep docs.count docs.deleted store.size pri.store.size
green  test-2018.11.06   5   1   15290978          438       34gb           17gb
green  test-2018.11.11   5   1   15392110           76     33.8gb         16.9gb
green  test-2018.11.10   5   1   15328574           76     33.7gb         16.8gb
green  test-2018.11.09   5   1   15320059           44     33.6gb         16.8gb
green  test-2018.11.08   5   1   15143309          126     33.3gb         16.6gb
green  test-2018.10.03   5   1   15066421           11     33.3gb         16.6gb
green  test-2018.10.26   5   1   15032894           45     33.2gb         16.6gb
green  test-2018.11.07   5   1   15021818           60     33.1gb         16.5gb
green  test-2018.10.17   5   1   14888749           21       33gb         16.5gb
green  test-2018.10.10   5   1   14871967           69       33gb         16.5gb

Chances are the audit logs may not log everything. How can I know the source/cause that triggers the deletion of these indices?


(Christian Dahlqvist) #2

If you are specifying document ids before indexing, updates will show up as deletes.


(Sandeepkanabar) #3

Hi @Christian_Dahlqvist, can you help me reproduce it?

I created a test index and indexed a document into it using the following:

PUT test_index/test_type/1
{
  "name" : "Elastic"
}

GET _cat/indices/test_index shows

health status index      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test_index jnzP2XyaSAy8y0dg0QOq0A   5   0          1            0      3.9kb          3.9kb

Next, wrote a small script to update it 100 times.

#!/bin/bash
max=100
for i in `seq 1 $max`; do
	curl -u elastic:changeme -XPOST "http://localhost:9200/test_index/test_type/1/_update" -H "Content-Type: application/json" -d"
	{
	  \"doc\": {
	    \"name\" : \"Elastic_${i}\"
	  }
	}"
done

Post the update, ran GET test_index/_search:

{
  "took": 0,
   ....
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "Elastic_100"
        }
      }
    ]
  }
}

GET _cat/indices/test_index?v still doesn't show any non-zero value in docs.deleted

health status index      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test_index jnzP2XyaSAy8y0dg0QOq0A   5   0          1            0        4kb            4kb

What could I be missing? This is local ES 5.x cluster on my macbook


(Sandeepkanabar) #4

Able to reproduce it now @Christian_Dahlqvist. What I did was instead of just 1 document, indexed 1000 documents.

And then updated 300 documents. The docs.deleted showed 20. Then updated 100 more documents. docs.deleted showed 120.

health status index      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test_index jnzP2XyaSAy8y0dg0QOq0A   5   0       1000          120    338.8kb        338.8kb

Could you kindly explain:

    1. Why indexing more documents, I could reproduce it?
    1. After indexing 1000 documents:
      a. When I updated just 3 docs, docs.deleted showed 2 (id 1 to 3)
      b. When I updated 300 docs, docs.deleted showed 20 (id 1 to 300)
      c. When I updated 100 docs, docs.deleted showed 120(id 300 to 400)

(Christian Dahlqvist) #5

I believe it depends on how segments are merged in the background. Once deleted documents have been merged out as segments are merged they should no longer show up in statistics.


(Sandeepkanabar) #6

Thanks. That pretty much explains. But how come that I could only reproduce it after I added sizeable no of docs (1000 in this case). Why I couldn't reproduce it with just 1 doc and updating that one doc 100,1000, 3000 times also didn't show any value in docs.deleted? Can you shed some light on this?


(Christian Dahlqvist) #7

It might be that a small amount of documents cause merging to happen quicker.


(Sandeepkanabar) #8

Thanks @Christian_Dahlqvist for all your help and support!


(Sandeepkanabar) #9

As expected, running POST test_index/_forcemerge?max_num_segments=1 resets the docs.deleted to 0.


(Sandeepkanabar) #10

@Christian_Dahlqvist - an update. Turns out that for the scenario, there are NO UPDATES happening. This is metric data thus no updates. So why does docs.deleted count show non-zero? Due to segment merging and segment deletions?

If that's so, then for another ES 5.5.x cluster, I've day-wise indices and for all of them, docs.deleted shows 0.


(Christian Dahlqvist) #11

Elasticsearch does not delete documents automatically. If you are specifying the document ID before indexing, it could be updates due to retries at the indexing layer.