Grok parser efficiency

Example portion of my grok parser:
match => { "message" => "(?[^ ]) (?.)" }

Problem: according to this my string fields are going to be both a "text" and "keyword". I am pretty sure this isn't ideal for memory usage (my use case: 64 GB memory but I'm still afraid of crashes due to tons of data).

Suppose I want to optimize the "message" field such that I do not want it to be kept in memory. Which type should I use? Can I do so in the grok parser (going to have hundreds of indexes, so I want a programmatic way to do it)? itself?


You manage mappings in Elasticsearch through index templates, so that is where you need to make the changes.

Thanks! I've used that before, and I'll probably have to do that as a last resort.

But I wonder if there is a way to do it through grok itself as it will save me a lot of trouble since I have hundreds of indexes to create (currently I'm letting logstash automatically create the index for me) and various other reasons.

No, the data extracted by grok is sent to Elasticsearch as a string or number, and how this is interpreted is determined by the index templates that apply.

If you are willing to alter field names, you can however create an index template that manages dynamic mappings based on field prefixes or suffixes. You can then determine the mapping based on how you name the fields, and you could control this when you parse the data, e.g. using grok. You could e.g. map all fields that end with _k as a pure keyword field and any field that end with _kt as a dual-mapped text/keyword field.

1 Like

WOW. I have no idea how to do what you're saying, but I will check it up. Explains the issue and exactly how to solve it instead of skirting around... very impressed.

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.