Deleted docs could be still retrieved although refreshed

Hi Elasticsearch,

I have an index with about 2.5 billion documents. The primary has a total of 10 shards, and each shard is about 10G.

We have a problem like this:
A query will find out the number of documents for some query conditions. We tried to delete one document, and then use the same query to update the result, and find that the document we just deleted is still exist in the hits collection.

Actually, the refresh interval setting of this index is 1s, we waited for 1min, and run the query, the result is still not updated.

We are a little confused, is there any documentation that can help us understand this behavior of Elasticsearch, that would be very appreciated.

Thanks
Qiaoqing.

Provide some desensitizing context to that could help you understand,

health status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex qgzM6RmHSdCTgSsacSBauw  10   2 2408738470            0    214.8gb        107.6gb

index settings

{
  "myindex" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            },
            "total_shards_per_node" : "2"
          }
        },
        "refresh_interval" : "1s",
        "number_of_shards" : "10",
        "provided_name" : "myindex",
        "creation_date" : "1669104181455",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "15m"
          }
        },
        "number_of_replicas" : "2",
        "uuid" : "qgzM6RmHSdCTgSsacSBauw",
        "version" : {
          "created" : "7100299"
        }
      }
    }
  }
}

delete query

DELETE /myindex/_doc/the_doc_id

aggregation query

POST /myindex/_search?routing=myroutingkey&typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512
{
	"from": 0,
	"size": 100,
	"query": {
		"bool": {
			"filter": [
				{
					"term": {
						"some_field": {
							"value": "somevalue",
							"boost": 1.0
						}
					}
				},
				{
					"term": {
						"some_field": {
							"value": "somevalue",
							"boost": 1.0
						}
					}
				},
				{
					"nested": {
						"query": {
							"bool": {
								"filter": [
									{
										"term": {
											"some_field": {
												"value": "somevalue",
												"boost": 1.0
											}
										}
									}
								],
								"adjust_pure_negative": true,
								"boost": 1.0
							}
						},
						"path": "properties",
						"ignore_unmapped": false,
						"score_mode": "none",
						"boost": 1.0
					}
				}
			],
			"must_not": [
				{
					"term": {
						"some_field": {
							"value": "somevalue",
							"boost": 1.0
						}
					}
				}
			],
			"adjust_pure_negative": true,
			"boost": 1.0
		}
	},
	"version": true,
	"explain": false,
	"sort": [
		{
			"some_field": {
				"order": "desc"
			}
		}
	],
	"aggregations": {
		"aggs_name1": {
			"nested": {
				"path": "some_field"
			},
			"aggregations": {
				"agg_name2": {
					"filter": {
						"range": {
							"some_field": {
								"from": 123456,
								"to": null,
								"include_lower": false,
								"include_upper": true,
								"boost": 1.0
							}
						}
					},
					"aggregations": {
						"agg_name3": {
							"terms": {
								"field": "some_field",
								"size": 100,
								"min_doc_count": 1,
								"shard_min_doc_count": 0,
								"show_term_doc_count_error": false,
								"order": {
									"_key": "desc"
								}
							},
							"aggregations": {
								"agg_name4": {
									"sum": {
										"field": "some_field"
									}
								},
								"performanceDateLifetimeGainOrLoss": {
									"sum": {
										"field": "some_field"
									}
								}
							}
						}
					}
				}
			}
		},
		"agg_name5": {
			"sum": {
				"field": "some_field"
			}
		},
		"agg_name6": {
			"sum": {
				"field": "some_field"
			}
		}
	}
}

And when you delete it and wait a minute (or the refresh time), can you GET /myindex/_doc/the_doc_id and it provides that document?

before executing that query, I delete the doc and then it still shows in the hits result.

If I try this command,

GET /myindex/_doc/the_doc_id

It will return 404.

Which version of Elasticsearch are you using?

How can you tell from the aggregation that the document is still there?

7.10

I tried to delete the doc using the DELETE API

DELETE /myindex/_doc/the_doc_id

and then use the above query to get the latest result, the same doc with the doc id the_doc_id is showed in the hits part, although GET /myindex/_doc/the_doc_id returns 404

Are you using (or have used) routing on this index? Can you try using a few different preference strings in your command, e.g. GET /myindex/_doc/the_doc_id?preference=preferencestring00001 and see if you find it on any other shard?

Maybe you could try a simple id query across all shards?

Thank you very much, Christian!

Yes.
I tried the three queries.

GET /myindex/_doc/the_doc_id?preference=preferencestring00001 returns 404
GET /myindex/_doc/the_doc_id returns 404

but the id query

GET /_search
{
  "query": {
    "ids" : {
      "values" : ["1", "4", "100"]
    }
  }
}

returns 200.

We are a little confused why the refresh interval is not working, could you please share with us some insights, or any documentations we can check.
We need to figure out some ways to avoid this.

THanks,
Qiaoqing.

Can you share the metadata of the documents you found when using the ID query?

Metadata?
you mean the response body?

it was like this

{
    "took" : 1133,
    "timed_out" : false,
    "_shards" : {
      "total" : 84,
      "successful" : 84,
      "skipped" : 0,
      "failed" : 0
    },
    "hits" : {
      "total" : {
        "value" : 1,
        "relation" : "eq"
      },
      "max_score" : 1.0,
      "hits" : [
        {
          "_index" : "myindex",
          "_type" : "_doc",
          "_id" : "the_doc_id",
          "_score" : 1.0,
          "_routing" : "the_routing_key",
          "_source" : {}
        }
      ]
    }
  }
  

And the

 GET /myindex/_doc/the_doc_id  returns
{
  "_index" : "myindex",
  "_type" : "_doc",
  "_id" : "the_doc_id",
  "found" : false
}

What does GET /myindex/_doc/the_doc_id?routing=the_routing_key return (replace document ID and routing key with real values)?

If you index a document using routing, you need to use that routing key for all operations on that document, e.g. updates or deletes, as this determines which shard the document gets written to. This has nothing to do with the refresh interval.

If you index a document with routing key abc it may get written to shard 1 even though the ID would have had it indexed into shard 5. If you just delete the document based on ID this request will go to shard 5 if you do not specify the same routing key and the document will not be deleted.

Thanks Christian.

GET /myindex/_doc/the_doc_id?routing=the_routing_key will return 200.

So should I also use routing key when I deleted the doc?

DELETE /myindex/_doc/the_doc_id?routing=the_routing_key

Or in other words,
How can I delete the same doc on all shards if I indexed one doc with routing key.

Thanks,
Qiaoqing.

You need to use the correct routing key when deleting the document like in your example. The only way to delete from all shards, without the routing key, is to use delete by query with an ID clause, but note that this is a lot more expensive than a direct delete using the routing key. This is an important aspect to consider when you adopt routing.

Thanks Christian and appreciate your patience.
I see, we will apply the changes in our code.

As you are running an old version that is EOL I would also recommend you upgrade to at least 7.17.