Reindex with `term_vector` results in deleted docs

I've been testing out the text mapping properties stored: true and term_vector: with_positions_offsets for text highlighting. Really I wanted to understand the impacts to storage on disk. I'm starting with an existing index, create a new mapping and using the Reindex API. Here is the existing mapping:

{
  "mappings": {
    "_doc": {
      "dynamic": "strict",
      "properties": {
        "title": { "type": "text", "copy_to": "all_text" },
        "body": { "type": "text", "copy_to": "all_text" },
        "all_text": { "type": "text" }
      }
    }
  }
}

I set the all_text field to be stored ("all_text": { "type": "text", "stored": true }) and ran the reindex, then used the cat API to check sizes. Index increased as expected:

health status index     uuid                    pri rep docs.count docs.deleted store.size pri.store.size
green  open   original  7rHkEB-sSf-uFXlvpzqpvA    5   1    1774199       479693      1.3gb        689.1mb
green  open   stored    WLk2LW4WTtKw0cDve31poQ    5   1    1774199            0      1.4gb        727.3mb

Then I set the all_text field with term_vector ("all_text": { "type": "text", "term_vector": "with_positions_offsets" }). Again as expected the new index was larger. See:

health status index           uuid                    pri rep docs.count docs.deleted store.size pri.store.size
green  open   original        7rHkEB-sSf-uFXlvpzqpvA    5   1    1774199       479693      1.3gb        689.1mb
green  open   term-vectors    f0IvAEMURYmgXsCA0cW-dQ    5   1    1776469       546468        2gb       1023.7mb

But something I found peculiar and it's the heart of this question, why are there deleted docs in the term-vectors index? There weren't in the stored index. I repeated these tests multiple times and got the same results. It's not causing a problem, I just felt like there was something I was missing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.