Understanding implications of `index: false` with `type: keyword`

Hello,

I have a pretty simple use case that has been discussed at length on this forum, I'm storing a large block of text that I just want to reside in my source document, not be indexed at all. Originally this field was index with the following:

"message_body": {
    "type": "keyword",
    "index": false
} 

We chose keyword because it was either keyword or text and with index: false we didn't think it would matter. However after a little while we ran into the 32K limit issue for the keyword field. I created a new index using "ignore_above": 1 on the field and to my surprise that worked.

So I migrated the data with the reindex api and that solved it. But I noticed something very surprising that I'd like help understanding, the size the of index on disk was reduced greatly. Just by adding: ignore_above: 1, the size of my index (total.store.size_in_bytes) has dropped from 39 GB to 15 GB in our test env. While this is really great because our production index is approach 1 TB, I don't understand the reason for this. With index: false the data is not store in the inverted index right? If that's the case why would there be such a different with ignore_above being set?

I also repeated this test using "type": "text" and "index": false. In that case the size of the index was also reduced down to 15GB. Can someone explain this to me?

I very embarrassingly forgot about doc_values. Doing the following also yields the reduced size on disk:

"message_body": {
    "type": "keyword",
    "index": false,
    "doc_values": false
} 

I am curious if there is any difference than above snippet and { "type": "text", "index": false } at that point. Is one preferential?

1 Like

text fields are analysed by default - Text field type | Elasticsearch Reference [7.11] | Elastic

1 Like

text fields are analysed by default -

So even when index: false is set, the analyzer will still be executed during indexing operations? If that's the case then it seems better to do:

"message_body": {
    "type": "keyword",
    "index": false,
    "doc_values": false,
    "ignore_above": 1
}