How does Elasticsearch indexes non-text fields

Hi,

I understand that Elasticsearch analyzes text fields and saves the resulting tokens in an inverted index data structure. But I am a bit confused regarding how does Elasticsearch stores other fields, like integer, float, and keyword? Does it treat values of these fields as tokens (without breaking further into tokens) and stores those values in an inverted index as it is? Or does it store those values in a separate data structure?

Thanks.

Perhaps take a read through this.

keyword, numerics date etc are not tokenized.. only text fields are tokenized

keywords are stored in the inverted index and doc_values as well

Numerics are also stored in the inverted index but with some other meta data to support range searches etc.

Thanks @stephenb

So this means that numeric and keyword types both have doc_values enabled by default, in order to optimize and support aggregation, sorting and lookup on those fields. But with numeric types there is an additional metadata stored, which is specifically to make range queries more efficient.

Please let me know if my understanding is correct.

Yes, that's generally correct, of course there's a lot of low level detail.

If there's something specific you're trying to solve, perhaps you should open a thread with the specific issue you are trying to solve.

1 Like

Thanks much @stephenb
I am not solving any problem at the moment. I am trying to understand ELK stack.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.