Matching multiple patterns in grok for a filebeat ingestion pipeline

Jim_Ivey · January 29, 2020, 6:24pm

In logstash's grok, there's a break_on_match field that allows grok to match multiple patterns. Am I correct in believing that no such thing exists for grok in filebeat module ingestion pipelines (e.g., .../filebeat/module/my_module/my_fileset/ingest/pipeline.json)?

How can I accomplish that?

The logs I'm grokking are pretty long. Here's an example:

2019-12-28T19:14:32.848+0000: 238687.843: [GC pause (G1 Evacuation Pause) (young), 0.0572720 secs]
   [Parallel Time: 34.1 ms, GC Workers: 13]
      [GC Worker Start (ms): Min: 238687843.5, Avg: 238687843.7, Max: 238687843.8, Diff: 0.3]
      [Ext Root Scanning (ms): Min: 0.3, Avg: 1.1, Max: 7.9, Diff: 7.6, Sum: 14.7]
      [Update RS (ms): Min: 0.0, Avg: 0.6, Max: 0.8, Diff: 0.8, Sum: 7.4]
         [Processed Buffers: Min: 0, Avg: 3.5, Max: 12, Diff: 12, Sum: 45]
      [Scan RS (ms): Min: 0.0, Avg: 0.4, Max: 0.5, Diff: 0.5, Sum: 4.6]
      [Code Root Scanning (ms): Min: 0.0, Avg: 1.9, Max: 5.1, Diff: 5.1, Sum: 25.1]
      [Object Copy (ms): Min: 25.7, Avg: 29.1, Max: 31.9, Diff: 6.2, Sum: 378.5]
      [Termination (ms): Min: 0.0, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 6.2]
         [Termination Attempts: Min: 1, Avg: 48.8, Max: 62, Diff: 61, Sum: 634]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.1]
      [GC Worker Total (ms): Min: 33.5, Avg: 33.7, Max: 33.8, Diff: 0.2, Sum: 437.6]
      [GC Worker End (ms): Min: 238687877.3, Avg: 238687877.3, Max: 238687877.4, Diff: 0.2]
   [Code Root Fixup: 0.4 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 1.2 ms]
   [Other: 21.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 20.0 ms]
      [Ref Enq: 0.2 ms]
      [Redirty Cards: 0.4 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.8 ms]
   [Eden: 14496.0M(14496.0M)->0.0B(14496.0M) Survivors: 224.0M->224.0M Heap: 14745.9M(24576.0M)->240.1M(24576.0M)]
 [Times: user=0.49 sys=0.00, real=0.06 secs]

Putting all my grok patterns into a single pattern would result in the pattern being over 5,000 characters long. At the moment, I'm trying multiple grok processors, one for each pattern. We'll see how that goes.

Thanks!

Kaiyan_Sheng · January 29, 2020, 11:06pm

Hi @Jim_Ivey I believe Filebeat does break_on_match by default(without a specific config parameter for it). Documentation on grok pattern also shows Returns on the first expression in the list that matches: https://www.elastic.co/guide/en/elasticsearch/reference/master/grok-processor.html#using-grok

Jim_Ivey · January 29, 2020, 11:12pm

Thanks for the quick response. I wanted to turn break_on_match off (set to false) to match multiple patterns in a single grok processor. That would allow me to break the 5,000-character pattern into multiple, shorter patterns.

What I've done (that works) is use a separate grok processor for each sub-pattern. I wasn't sure that I could have a dozen or so grok processors, but it worked.

Thanks again for your help.

system · February 26, 2020, 11:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES grok processor break_on_match => false needed Elasticsearch	3	3134	September 19, 2017
Grok performance Logstash	5	1308	January 18, 2018
Help on multiline Beats filebeat	12	2712	November 15, 2017
Manage multiline Beats filebeat	2	644	November 20, 2017
GROK Multiple Match - Logstash Logstash	4	27109	July 6, 2017

Matching multiple patterns in grok for a filebeat ingestion pipeline

Related topics