Stop proccesing events/lines after first grok match

nino · January 11, 2020, 8:45am

Hello all,

i am not be able to avoid logstash continues proccesing events after grok patter matched. I just want to index the first occurence of the entire file. Here is what i have:

File content

first line content...
second line content...
third line should match EXTRACT_THIS
fourth line content...
fifth line not should match EXTRACT_THIS

Filter

filter {
    grok {  match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }  }
}

Output (stdout) - I need retrieve just the first line macth, i don't want second match to output or be processed

{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "third line should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}
{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "fifth line not should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}

Best regards

nino · January 11, 2020, 10:50am

I've tried with "break_on_match" but i don't understand the behavier of this directive. It makes nothing.

break_on_match => true

filter {
   break_on_match => true
    grok {  match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }  }
}

stdout not changed

{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "third line should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}
{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "fifth line not should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}

break_on_match => false

filter {
 break_on_match => false   
 grok {  match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }  }
}

stdout not changed

{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "third line should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}
{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "fifth line not should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}

Badger · January 11, 2020, 4:23pm

break_on_match controls behaviour when you are matching a field against an array of patterns. If it is set to false then grok will only match the first entry in the array that matches, and ignore subsequent patterns. You only have a single pattern, so it has no effect.

I would suggest consuming the entire file as a single event. Then run grok against that, or you might need to use ruby to scan the [message] field and pick out the first match from the array of matches that scan returns.

nino · January 11, 2020, 6:06pm

Thank you Badger,

when you are talking about array of patterns you mean something like:

filter {
  grok {  
       break_on_match => false
       match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }
       match => { message => "(?<extraction2>(?<=EXTRACT_THIS).*)" }
       match => { message => "(?<extraction3>(?<=EXTRACT_THIS).*)" }  
  }
}

Matching the message field against several patterns and if first pattern match and break_on_match => true the other two are not processed, do i understand it correctly?

So you suggest load the entire file in "message" and matching different parts with several grok patters, something line this:

 input{ 
    codec => multiline { pattern => "^Spalanzani" negate => true what => previous   
    auto_flush_interval => 1 multiline_tag => "" } 
 }

 filter{
      grok {  
           break_on_match => false
           match => { message => "(?<field1>(?<=EXTRACT_THIS_PART_1).*)" }
           match => { message => "(?<field2>(?<=EXTRACT_THIS_PART_2).*)" }
           match => { message => "(?<field3>(?<=EXTRACT_THIS_PART_3).*)" }  
      }
 }

But, what if i grok pattern match with several parts of entire file, wouldn't i have the same problem?

Sorry about my english

Badger · January 11, 2020, 6:42pm

nino:

grok {
    break_on_match => false
    match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }
    match => { message => "(?<extraction2>(?<=EXTRACT_THIS).*)" }
    match => { message => "(?<extraction3>(?<=EXTRACT_THIS).*)" }
}

The syntax is actually

grok {
    break_on_match => false
    match => {
        message => [
                "(?<extraction1>(?<=EXTRACT_THIS).*)",
                "(?<extraction2>(?<=EXTRACT_THIS).*)",
                "(?<extraction3>(?<=EXTRACT_THIS).*)"
        ]
    }
}

It will just pick up the first match.

nino · January 12, 2020, 3:17pm

Thank you Bagder, my problem now is that sometimes i need to macth the first occurrence and other times second or third occurence, is it posible to achieve this with using the array grok pattern?

Best regards

nino · January 12, 2020, 6:13pm

Following with the example...

File

first line content...
second line content...
EXTRACT_THIS 1234
fourth line content...
EXTRACT_THIS 5678
more content...
and more and more...
EXTRACT_THIS 456

Input mode multiline mode

codec => multiline { 
      pattern => "^mposibleMatch"
      negate => true
      what => previous
      auto_flush_interval => 1
      multiline_tag => ""
    }

Filter (grok array pattern mode)

grok{

 break_on_match => false
    match => {
        message => [
         "EXTRACT_THIS\s+(?<fieldname1>[0-9,]+)"
        ]         
      }
}

This code extracts first occurrence, how can i extract the second one? or in case of many occurrences, how can i select the one i want? I want to capture the values, and values are changing in every file is entering on the repository (a folder in this case)

Best regards

Badger · January 12, 2020, 10:56pm

In that case use ruby and scan

ruby { code => 'event.set("anArray", event.get("message").scan(/someRegexp/))' }

then pick whichever member of the array that you want.

nino · January 12, 2020, 11:13pm

Thank you so much Badger,

could you please provide some simple example? I would like to undertand how this ruby block works.

Best regards

Badger · January 12, 2020, 11:35pm

input { generator { count => 1 lines => [ '
Foo: 21
Another Foo: 14
Final Foo: 734591' ] } }

ruby { code => 'event.set("Foos", event.get("message").scan(/Foo: ([0-9]+)/))' }
output { stdout { codec => rubydebug { metadata => false } } }

will produce

      "Foos" => [
    [0] [
        [0] "21"
    ],
    [1] [
        [0] "14"
    ],
    [2] [
        [0] "734591"
    ]

scan finds the three occurrences of Foo: followed by a number. For each occurence it returns an array of capture groups (parentheses inside the regexp). In my case there is only one capture group for each occurrence.

nino · January 12, 2020, 11:41pm

Thank you so much Badger, I really apreciate your support and patience.

Best regards

system · February 9, 2020, 11:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to index only the matched log lines of grok in Logstash? Logstash	6	3096	March 5, 2017
Logstash/grok to match only first occurrence and stop parsing repeatedly for same values Logstash	24	2137	July 5, 2021
Which hits get picked up by grok filter in multiline events Logstash	2	502	July 6, 2017
Filter first match only using Grok or ruby code Logstash	2	159	November 1, 2022
Logstash stops parsing after one entry Logstash	1	681	July 6, 2017

Stop proccesing events/lines after first grok match

File content

Filter

Output (stdout) - I need retrieve just the first line macth, i don't want second match to output or be processed

break_on_match => true

stdout not changed

break_on_match => false

stdout not changed

File

Input mode multiline mode

Filter (grok array pattern mode)

Related topics