Doc.deleted keeps increasing while indexing

Hi guys,

I'm trying to index 24mi items into ES and I noticed that the number of doc.deleted started to increase after reaching 1mi even though I'm not doing any update/delete calls. I'm using the Bulk API for Python with elasticsearch-py.

My current Elasticsearch version is 2.3 running on AWS.

Here are some calls I did on /_cat/indices?v:

health status index     pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   .kibana-4   1   1          2            0      7.8kb          7.8kb 
yellow open   events      5   1    1000000            0     92.6mb         92.6mb 
---------------------------
health status index     pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   .kibana-4   1   1          2            0      7.8kb          7.8kb 
yellow open   events      5   1    1057152            0    103.4mb        103.4mb 
---------------------------
health status index     pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   .kibana-4   1   1          2            0      7.8kb          7.8kb 
yellow open   events      5   1    2201952        67600    310.3mb        310.3mb 
--------------------------
health status index     pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   .kibana-4   1   1          2            0      7.8kb          7.8kb 
yellow open   events      5   1    2728081       116471    304.1mb        304.1mb

You might be indexing to the same document id. An update is just a delete
followed by an insert.

Yeah. The problem was that I was not ordering the items. What I was doing was getting the data from PostgreSQL with pagination but without ordering the items before doing the pagination so each page could have the same items as the previous page and that's why it was getting deleted.

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.