I'm not too familiar with the details of ECS but generally mapping large free-text fields as keyword
has limited use in my experience. There's a hard Lucene limit of 32766 on indexed values and typically an ignore_above
setting is used to avoid index bloat of the type you saw. That also means a lot of docs then have no indexed value.
The idea of a hash is a good one which:
a) Keeps a limit on the size stored in the index
b) Limits the size of values shown in Kibana histograms etc
c) Retains a value for every doc
The downside is a lack of readability in visualization results but you can typically drill-down to raw docs to see the original message. A compromise might be to index the hash and the first N characters of the message for readability e.g.:
[xxMyHashxxx] Error reading file ....