Deleted document count very high compared to the actual count of documents in the index

ES version: 7.5
Number of shards: 5
Number of replicas: 4

We have a use case, where we have a lot of updates happening to the document (location updates).
So every update is deleting and creating a new document in the background.

At a point, the actual records was ~70k whereas the deleted count went beyond 40 million and the query times were linearly related to this. Query times went from ~100ms to 3seconds due to this.

Another observation made, after around 12 hours this count comes down to about 4 million (which is still huge). Do we have any configuration setting where we can do this deletion/merge of segments more frequently to have the deleted docs count under control.

Thanks in advance

Hey,

this sounds like a high count indeed. A couple of questions. Do you have any long running scroll searches/snapshots going on, that require certain files to be held open? You can also check using lsof if there are files marked as deleted but still held open by the Elasticsearch process. You can use the node stats to check for any open search contexts. And also check the indices part of that output.

Have you tweaked any Elasticsearch configuration? GC collection or merging options?

Also, what version are you running on exactly? About how much data are we talking when you only consider the 70k documents?

--Alex

@spinscale, thanks for responding. Please find answers to your queries below:

Do you have any long running scroll searches/snapshots going on, that require certain files to be held open?
-- We do have scroll queries in place (which will soon be replaced with paginated queries), but they are not long running. At one point in time we might see around 100 scroll queries happening. Also, we have increased the maximum open scroll contexts to 10000 .

You can also check using lsof if there are files marked as deleted but still held open by the Elasticsearch process. You can use the node stats to check for any open search contexts. And also check the indices part of that output.
--- Below is the output for the index stats (we have a 11 node cluster):

   "docs": {
        "count": 384139,
        "deleted": 8980155
    },
    "search": {
        "open_contexts": 36719,
        "query_total": 151414443,
        "query_time_in_millis": 585255876,
         "query_current": 10,
        "fetch_total": 132808243,
        "fetch_time_in_millis": 80296130,
        "fetch_current": 4,
        "scroll_total": 87969990,
        "scroll_time_in_millis": 2913045740496,
        "scroll_current": 36719,
        "suggest_total": 0,
        "suggest_time_in_millis": 0,
        "suggest_current": 0
    }

In the above result, the count is including all 5 shards

Have you tweaked any Elasticsearch configuration? GC collection or merging options?
-- We did not tweak any of the above configurations. Although we did reduce the node query cache from 10% of heap (16 GB heap) to 5 mb. This is because our search results constantly change as we do a radial search to get the list of people within a radius & the location of the people are changing constantly. Hence, we did not find the need to have a high query cache which had a very high miss rate.

Also, what version are you running on exactly? About how much data are we talking when you only consider the 70k documents?
-- We are running on ES 7.5 (recently moved to a this version, previously on 2.4). Data is not huge. Currently, we have 77,223 documents with a size of 1.74 GB
Adding to this, the size of documents was much lesser in ES 2.4 compared to ES 7.5. In ES 2.4, the total size for 75k docs was only ~50 MB, but, this is too high in the new version. Were there any changes with respect to this?

We have around 12 fields in the index - one of which is a geo_point location, one is a nested object with keyword fields & others are text fields.
The nested object looks something like this:

"outer_field": {
   "type": "nested",
    "properties": {
          "inner_field_1": {
               "type": "keyword"
           },
          "inner_field_2": {
              "type": "keyword"
          }
     }
}

Now that is interesting, you do have a lot of open scroll contexts. What are you using these searches for and why cant you use a regular search? Also, can you make sure you are clearing your scrolls?

See https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-request-body.html#request-body-search-scroll - also how to clear a scroll.

@spinscale , thanks for that.
We are already in the process of moving away from scroll (it is a work in progress) as it does not suit our real time search requirement.

And when you say clear scrolls, do you mean clear it after completion of a query? Or do a time to time cleanup of all scrolls?

Also, any help regarding the huge size of docs in ES 7.5 compared to ES 2.4?

And, is the high deleted document count because of the usage of scroll?

I meant after finishing your searches, as part of the tool that is doing the search. They should also auto-expire, unless you have specified a high timeout? What is the scroll timeout set to?

I suppose that the high document count stems from unclosed scrolls.

@spinscale
We have the scroll parameter set as 3s.

@spinscale
We did move out of scroll. But we still observe the same issue.

"docs": {
    "count": 70521,
    "deleted": 342734
}

The deleted docs count keeps increasing unless we do a force merge to get rid of them. This is causing serious performance issues. Could you please help on this. Also check the open contexts below (it is 0)

"search": {
   "open_contexts": 0,
   "query_total": 4118668,
   "query_time_in_millis": 14996981,
   "query_current": 0,
   "fetch_total": 2718573,
   "fetch_time_in_millis": 1656643,
   "fetch_current": 0,
   "scroll_total": 3280959,
   "scroll_time_in_millis": 108124546292,
   "scroll_current": 0,
   "suggest_total": 0,
   "suggest_time_in_millis": 0,
   "suggest_current": 0
},

@dadoonet @elasticsearch

Please do not ping people who are not yet involved in your question.

Read this and specifically the "Also be patient" part.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

@dadoonet @spinscale sorry about that.

I asked about lsof output earlier in this thread. That would be helpful as well.

However your current document vs. deleted ration does not look nearly as bad as before. Is this staying constant or going back into the millions after you disabled scroll searching?

@spinscale
lsof | egrep deleted - This returned zero results

elasticsearch-1:~# lsof /var/lib/elasticsearch/  | head -n 10
COMMAND  PID          USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
java    3158 elasticsearch  mem    REG   8,16   479066 2753188 /var/lib/elasticsearch/nodes/0/indices/b92rg2qORX6uCLBljXRB8A/0/index/_86yx_Lucene80_0.dvd
java    3158 elasticsearch  mem    REG   8,16  1459621 2752968 /var/lib/elasticsearch/nodes/0/indices/b92rg2qORX6uCLBljXRB8A/2/index/_8qf8_Lucene80_0.dvd
java    3158 elasticsearch  mem    REG   8,16  1530250 2752879 /var/lib/elasticsearch/nodes/0/indices/b92rg2qORX6uCLBljXRB8A/1/index/_8ce8_Lucene80_0.dvd
java    3158 elasticsearch  mem    REG   8,16  1420585 2753021 /var/lib/elasticsearch/nodes/0/indices/b92rg2qORX6uCLBljXRB8A/0/index/_7x78_Lucene80_0.dvd
java    3158 elasticsearch  mem    REG   8,16   406158 2752988 /var/lib/elasticsearch/nodes/0/indices/b92rg2qORX6uCLBljXRB8A/0/index/_7x78.nvd
java    3158 elasticsearch  mem    REG   8,16   571063 2752772 /var/lib/elasticsearch/nodes/0/indices/b92rg2qORX6uCLBljXRB8A/1/index/_8ce8_Lucene50_0.tim
java    3158 elasticsearch  mem    REG   8,16   361006 2752869 /var/lib/elasticsearch/nodes/0/indices/b92rg2qORX6uCLBljXRB8A/2/index/_8z0z_Lucene80_0.dvd
java    3158 elasticsearch  mem    REG   8,16   336668 2752833 /var/lib/elasticsearch/nodes/0/indices/b92rg2qORX6uCLBljXRB8A/0/index/_86yx_Lucene50_0.tim
java    3158 elasticsearch  mem    REG   8,16  1566963 2752713 /var/lib/elasticsearch/nodes/0/indices/stHsA8KwRVq76-9AwMRuGA/0/index/_0.cfs

The deleted document count still keeps increasing:

"docs": {
    "count": 50523,
    "deleted": 1782810
},

@spinscale
This is the current count:

"docs": {
    "count": 61236,
    "deleted": 8739870
},

Deleted docs count is close to 9 million now.

Just wondering if there is any parameter/setting that can help clean up these deleted documents frequently, something similar to forcemerge API.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.