I've been testing out the text mapping properties stored: true
and term_vector: with_positions_offsets
for text highlighting. Really I wanted to understand the impacts to storage on disk. I'm starting with an existing index, create a new mapping and using the Reindex API. Here is the existing mapping:
{
"mappings": {
"_doc": {
"dynamic": "strict",
"properties": {
"title": { "type": "text", "copy_to": "all_text" },
"body": { "type": "text", "copy_to": "all_text" },
"all_text": { "type": "text" }
}
}
}
}
I set the all_text
field to be stored ("all_text": { "type": "text", "stored": true }
) and ran the reindex, then used the cat
API to check sizes. Index increased as expected:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open original 7rHkEB-sSf-uFXlvpzqpvA 5 1 1774199 479693 1.3gb 689.1mb
green open stored WLk2LW4WTtKw0cDve31poQ 5 1 1774199 0 1.4gb 727.3mb
Then I set the all_text
field with term_vector ("all_text": { "type": "text", "term_vector": "with_positions_offsets" }
). Again as expected the new index was larger. See:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open original 7rHkEB-sSf-uFXlvpzqpvA 5 1 1774199 479693 1.3gb 689.1mb
green open term-vectors f0IvAEMURYmgXsCA0cW-dQ 5 1 1776469 546468 2gb 1023.7mb
But something I found peculiar and it's the heart of this question, why are there deleted docs in the term-vectors
index? There weren't in the stored
index. I repeated these tests multiple times and got the same results. It's not causing a problem, I just felt like there was something I was missing.