Old doc versions will not be deleted

Hi,

in this discuss Deleting old versions stands if you use the same ID for a doc: ... each time you index a doc with the same ID, the old version will be marked as deleted, and will no longer be visible.

we have now with the Kibana Version 7.9.0 the "issue" that the discover shows all versions (the actual and old). These 'deleted' doc will not be removed after more than 14 days, even if we do a _forcemerge.

How can we delete every old version inside one index? And is it possible to disable the old version shown in Kibana? In every request in discovery the version: true is set. And I have found no option to disable it.

Best regards
Messias

Can you please show an example of these duplicate documents? Make sure you include the document id and index name.

1 Like

Hi,

@Christian_Dahlqvist: You understand it wrong there are no duplicates. Only different version of one document.

If you search with: /_search?version=true you will get like this:

{
    "_index": "index-2020.08.09",
    "_type": "_doc",
    "_id": "ID#1",
    "_score": 1.0,
    "_version": 1,
    "_source": {
        "@timestamp": "2020-08-09T16:48:13",
        SOMEDATA
}

{
    "_index": "index-2020.08.09",
    "_type": "_doc",
    "_id": "ID#1",
    "_score": 1.0,
    "_version": 2,
    "_source": {
        "@timestamp": "2020-08-09T16:48:13",
        UPDATEDDATA
}

the older version (in this case 1) do not needed.

best regards
Messias

Is that the actual document _id for each one?

That is what I meant by duplicate. Are the ids identical? Do you use routing during indexing?

Hi,

@warkolm: this is only a dummy for ID

@Christian_Dahlqvist: the IDs are identical and no routing is used.

I do not understand why the old version is not be deleted. And have one a option / idea how you can disable the "version: true" in Kibana?

Best regards
Messias

How many shards does the index in question have? If you are not explicitly using routing are you by any chance using parent-child?

@Christian_Dahlqvist: only 2 shards. I have never used a parent-child feature. The only thing is where this happens is I use the update API and do not a complete new index.

Thanks for your input i will test some things.

Best regards
Messias

Have never seen this problem before so am out of ideas.

Hi Messias,

As it is explained in the article you are referring to, if documents are updated with the same '_id' value, a new document is created and the old one is internally flagged by lucene as deleted and never returned in search results. Future segment merge operations will definitivelly remove if.

If updates create new documents with dedicated _id value, _forcemerge won't make them disappear.

How are these documents updated and what is this 14 days retention period you are speaking about ?
The only place a find this kind of retention is in the self hosted gitlab documentation about the elasticsearch integration.
https://docs.gitlab.com/ee/integration/elasticsearch.html#trigger-the-reindex-via-the-elasticsearch-administration

Regards

Dominique

Can you please show the exact IDs and as much of the original document as you can.