Performance of keyword vs fielddata

Hi,

I had a couple of questions regarding keywords vs fielddata.

I understand if fielddata type for "text" fields are enabled, it is much more performance intensive since it has to store it in the JVM memory itself.

  1. How does the keyword perform in comparison to fielddata? Is it also stored in memory?

  2. When I run _cat/fielddata i'm seeing multiple fields that are returned with a .keyword at the end. The biggest ones are message.keyword. Does this mean that every keyword type is also stored in memory since _cat/fielddata is supposed to return how much heap memory is being used?

Here's the top results of what _cat/fielddata?h=field,size returns

message.keyword                               1.2mb
message.keyword                               1.2mb
message.keyword                               668kb
message.keyword                             479.5kb
message.keyword                             436.7kb
message.keyword                             431.9kb
message.keyword                             406.1kb
message.keyword                               399kb
message.keyword                             366.2kb
message.keyword                             274.1kb
message.keyword                             238.3kb
message.keyword                             205.9kb
message.keyword                             154.2kb
message.keyword                             148.9kb
message.keyword                             141.2kb
message.keyword                             134.5kb
message.keyword                             102.5kb
message.keyword                              99.2kb
message.keyword                              90.4kb
message.keyword                              90.3kb
message.keyword                                88kb
host.name.keyword                            64.6kb
  1. Why are .keyword fields being returned in a _cat/fielddata API calls?

Thanks

Fieldata loads data in JVM heap. The first time you want to sort or aggregate, you will load the data in memory.
Keyword data type uses doc values. Doc values are stored on disk. They are precomputed at index time. Any time you want to sort or aggregate, you will read the data from disk. But remember that segments are immutable. Which means that the OS will cache in its memory the files you are often reading. So at the end of the day, you will have loaded doc values in RAM. And it will be as fast as fielddata but without using the JVM HEAP which has a lot of advantages.

HTH

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.