Index space usage with text and keyword datatype

Mark_Harwood · July 23, 2019, 10:58am

I'm not too familiar with the details of ECS but generally mapping large free-text fields as keyword has limited use in my experience. There's a hard Lucene limit of 32766 on indexed values and typically an ignore_above setting is used to avoid index bloat of the type you saw. That also means a lot of docs then have no indexed value.
The idea of a hash is a good one which:

a) Keeps a limit on the size stored in the index
b) Limits the size of values shown in Kibana histograms etc
c) Retains a value for every doc

The downside is a lack of readability in visualization results but you can typically drill-down to raw docs to see the original message. A compromise might be to index the hash and the first N characters of the message for readability e.g.:

[xxMyHashxxx] Error reading file ....

Topic		Replies	Views
Keyword Type Elasticsearch	3	340	May 16, 2019
Data type for Log Message Fields, does keyword add overhead? Elasticsearch	7	659	June 5, 2022
Keyword fields, Graylog and ElastAlert Elasticsearch	9	2193	March 4, 2019
Is it wasteful to configure text as keyword with "fields": keyword? Elasticsearch	4	550	March 11, 2020
Multi Fields Diadvantage Elasticsearch	7	504	June 13, 2020

Index space usage with text and keyword datatype

Related topics