Performance of keyword vs fielddata

yungnvn · October 22, 2019, 11:52pm

Hi,

I had a couple of questions regarding keywords vs fielddata.

I understand if fielddata type for "text" fields are enabled, it is much more performance intensive since it has to store it in the JVM memory itself.

How does the keyword perform in comparison to fielddata? Is it also stored in memory?
When I run _cat/fielddata i'm seeing multiple fields that are returned with a .keyword at the end. The biggest ones are message.keyword. Does this mean that every keyword type is also stored in memory since _cat/fielddata is supposed to return how much heap memory is being used?

Here's the top results of what _cat/fielddata?h=field,size returns

message.keyword                               1.2mb
message.keyword                               1.2mb
message.keyword                               668kb
message.keyword                             479.5kb
message.keyword                             436.7kb
message.keyword                             431.9kb
message.keyword                             406.1kb
message.keyword                               399kb
message.keyword                             366.2kb
message.keyword                             274.1kb
message.keyword                             238.3kb
message.keyword                             205.9kb
message.keyword                             154.2kb
message.keyword                             148.9kb
message.keyword                             141.2kb
message.keyword                             134.5kb
message.keyword                             102.5kb
message.keyword                              99.2kb
message.keyword                              90.4kb
message.keyword                              90.3kb
message.keyword                                88kb
host.name.keyword                            64.6kb

Why are .keyword fields being returned in a _cat/fielddata API calls?

Thanks

dadoonet · October 23, 2019, 1:11am

Fieldata loads data in JVM heap. The first time you want to sort or aggregate, you will load the data in memory.
Keyword data type uses doc values. Doc values are stored on disk. They are precomputed at index time. Any time you want to sort or aggregate, you will read the data from disk. But remember that segments are immutable. Which means that the OS will cache in its memory the files you are often reading. So at the end of the day, you will have loaded doc values in RAM. And it will be as fast as fielddata but without using the JVM HEAP which has a lot of advantages.

HTH

system · November 20, 2019, 1:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Resource upper bounds on keyword sorting Elasticsearch	1	325	January 6, 2021
Fielddata stats Elasticsearch	2	544	February 13, 2020
Why do aggregation queries on keyword field produce fieldata in JVM HEAP? Elasticsearch	2	405	August 10, 2018
Understanding fielddata size Elasticsearch	2	321	October 22, 2019
ElasticSearch term Aggregation on text fields Elasticsearch	1	405	June 25, 2020

Performance of keyword vs fielddata

Related topics