Example portion of my grok parser:
match => { "message" => "(?[^ ]) (?.)" }
Problem: according to this my string fields are going to be both a "text" and "keyword". I am pretty sure this isn't ideal for memory usage (my use case: 64 GB memory but I'm still afraid of crashes due to tons of data).
Suppose I want to optimize the "message" field such that I do not want it to be kept in memory. Which type should I use? Can I do so in the grok parser (going to have hundreds of indexes, so I want a programmatic way to do it)? itself?
Thanks! I've used that before, and I'll probably have to do that as a last resort.
But I wonder if there is a way to do it through grok itself as it will save me a lot of trouble since I have hundreds of indexes to create (currently I'm letting logstash automatically create the index for me) and various other reasons.
No, the data extracted by grok is sent to Elasticsearch as a string or number, and how this is interpreted is determined by the index templates that apply.
If you are willing to alter field names, you can however create an index template that manages dynamic mappings based on field prefixes or suffixes. You can then determine the mapping based on how you name the fields, and you could control this when you parse the data, e.g. using grok. You could e.g. map all fields that end with _k as a pure keyword field and any field that end with _kt as a dual-mapped text/keyword field.
WOW. I have no idea how to do what you're saying, but I will check it up. Explains the issue and exactly how to solve it instead of skirting around... very impressed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.