I am using Grok filter plugin for Logstash (Logstash version 6.7.1) . My log line has several formats, so I have written more than 10 grok patterns for it. Now, I am interested in understanding the grok pattern which matched my log line. It could be 1 more field in elasticsearch index depicting matching grok pattern name or number so that I can create visualization on top of it.
The reason for it, if I know grok pattern which matches maximum of log lines, I can put that pattern at 1st position in the list of grok patterns. This could be performance gain for my use case, as most of my log line will match 1st pattern and will skip matching next grok pattern.
In order to determine that you could split your grok so that each grok has a single pattern and adds a tag when it matches, then just count the tags in elasticsearch.
Thanks Badger. But, if I have separate grok pattern then, my each logline will try to parse each of them which will be little overhead and performance hit for logstash.
It seems, I can not collect such statistics on prod environment on continuous basis. I'll have to go with the approach you mentioned before I get into production.
They currently match in order of appearance. You could split them up like @Badger suggested, adding tags for the individual grok filter. You could then also add a "grokked" tag, and run the next one conditionally:
if "grokked" not in [tags] {
grok {
match => [ ... ]
add_tag => ["this_grok_filter_id", "grokked"]
}
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.