Performance implication of multiple grok patterns for a single log file

asgs · September 26, 2018, 9:14pm

hello! we're discussing in our team whether

it's OK to log two set of log entries to a single file and then having logstash config specify the two patterns
(OR)
it makes sense to direct each set of entries to its own log file and have the logstash config specify one pattern per log file.

Both of them seem to work from a functional perspective, but is it true if we go with option #1 above that logstash ends up trying both patterns if need be (in the order specified in the config) to find the best match, resulting in a processing overhead?

yaauie · September 26, 2018, 10:36pm

Yes, there is overhead to "falling through" a list of patterns, but there is also overhead in maintaining separate pipelines. Depending on your patterns, though, a lot of this can be minimised.

Make each pattern fail as quickly as possible on mismatched input. This includes anchoring the pattern to the beginning of your input (^) so the regex engine doesn't attempt the pattern repeatedly when it fails.
If your inputs share a common prefix format, decode in phases, with one grok filter that extracts the common bits, saving the rest to a temporary field (e.g., [@metadata][rest]) and another that picks up the rest to parse it further. This way the engine doesn't have to start over and re-parse that common prefix with each subsequent pattern.

asgs · September 27, 2018, 2:32pm

thank you, @yaauie. can you please elaborate the overhead involved in maintaining a separate pipeline?

The idea of parsing the common bits to reduce the repeated parsing is nice, but the contents of the two sets of entries bear no relevance. it's as if the entries should be in their own log files.

yaauie · September 27, 2018, 5:12pm

The overhead of a separate pipeline is relatively low, but non-zero.

Depending on your queue configuration (in-memory or persisted), multiple pipelines can mean more data structures in memory or on disk. If your pipelines are consistently able to "keep up" with inbound load, this should be negligible.
if one pipeline is significantly more complex than the others, you may need to manually tune the number of workers per pipeline to reduce resource contention.

If they are drastically different though, I would advise separate output from your applications leading to separate pipelines. The cognitive overhead of writing a single pipeline to do two drastically different things increases the likelihood of mistakes, and in my book that far outweighs any marginal performance difference.

I would also advise starting with the Dissect filter instead of Grok, and only using Grok where your input is too complex for Dissect. It is significantly easier to get started with from a development perspective, the patterns are simpler to maintain, and as a bonus it often ends up being significantly less CPU-intensive.

asgs · October 9, 2018, 7:27pm

Thank you, @yaauie. Dissect filter seems to be much simpler than Grok especially when the content is predictable!

system · November 6, 2018, 7:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash and Grok Matching Logstash	11	397	July 3, 2019
Unable to parse logs Logstash	5	329	June 19, 2018
Grok best practice Logstash	5	1529	April 17, 2019
Multiple Grok pattern filters arent filtering multiple logs in one logstash file Logstash	8	3648	February 13, 2020
Pipelines stops grok filter in Logstash Logstash	7	649	June 13, 2018

Performance implication of multiple grok patterns for a single log file

Related topics