Stop proccesing events/lines after first grok match

Hello all,

i am not be able to avoid logstash continues proccesing events after grok patter matched. I just want to index the first occurence of the entire file. Here is what i have:

File content

first line content...
second line content...
third line should match EXTRACT_THIS
fourth line content...
fifth line not should match EXTRACT_THIS

Filter

filter {
    grok {  match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }  }
}

Output (stdout) - I need retrieve just the first line macth, i don't want second match to output or be processed

{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "third line should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}
{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "fifth line not should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}

Best regards

I've tried with "break_on_match" but i don't understand the behavier of this directive. It makes nothing.

break_on_match => true

filter {
   break_on_match => true
    grok {  match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }  }
}

stdout not changed

{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "third line should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}
{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "fifth line not should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}

break_on_match => false

filter {
 break_on_match => false   
 grok {  match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }  }
}

stdout not changed

{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "third line should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}
{
       "@version" => "1",
           "path" => "",
    "extraction1" => "\r",
           "host" => "",
        "message" => "fifth line not should match EXTRACT_THIS\r",
     "@timestamp" => 2020-01-11T08:44:18.561Z
}

break_on_match controls behaviour when you are matching a field against an array of patterns. If it is set to false then grok will only match the first entry in the array that matches, and ignore subsequent patterns. You only have a single pattern, so it has no effect.

I would suggest consuming the entire file as a single event. Then run grok against that, or you might need to use ruby to scan the [message] field and pick out the first match from the array of matches that scan returns.

Thank you Badger,

when you are talking about array of patterns you mean something like:

filter {
  grok {  
       break_on_match => false
       match => { message => "(?<extraction1>(?<=EXTRACT_THIS).*)" }
       match => { message => "(?<extraction2>(?<=EXTRACT_THIS).*)" }
       match => { message => "(?<extraction3>(?<=EXTRACT_THIS).*)" }  
  }
}

Matching the message field against several patterns and if first pattern match and break_on_match => true the other two are not processed, do i understand it correctly?

So you suggest load the entire file in "message" and matching different parts with several grok patters, something line this:

 input{ 
    codec => multiline { pattern => "^Spalanzani" negate => true what => previous   
    auto_flush_interval => 1 multiline_tag => "" } 
 }

 filter{
      grok {  
           break_on_match => false
           match => { message => "(?<field1>(?<=EXTRACT_THIS_PART_1).*)" }
           match => { message => "(?<field2>(?<=EXTRACT_THIS_PART_2).*)" }
           match => { message => "(?<field3>(?<=EXTRACT_THIS_PART_3).*)" }  
      }
 }

But, what if i grok pattern match with several parts of entire file, wouldn't i have the same problem?

Sorry about my english

The syntax is actually

grok {
    break_on_match => false
    match => {
        message => [
                "(?<extraction1>(?<=EXTRACT_THIS).*)",
                "(?<extraction2>(?<=EXTRACT_THIS).*)",
                "(?<extraction3>(?<=EXTRACT_THIS).*)"
        ]
    }
}

It will just pick up the first match.

Thank you Bagder, my problem now is that sometimes i need to macth the first occurrence and other times second or third occurence, is it posible to achieve this with using the array grok pattern?

Best regards

Following with the example...

File

first line content...
second line content...
EXTRACT_THIS 1234
fourth line content...
EXTRACT_THIS 5678
more content...
and more and more...
EXTRACT_THIS 456

Input mode multiline mode

codec => multiline { 
      pattern => "^mposibleMatch"
      negate => true
      what => previous
      auto_flush_interval => 1
      multiline_tag => ""
    }

Filter (grok array pattern mode)

grok{

 break_on_match => false
    match => {
        message => [
         "EXTRACT_THIS\s+(?<fieldname1>[0-9,]+)"
        ]         
      }
}

This code extracts first occurrence, how can i extract the second one? or in case of many occurrences, how can i select the one i want? I want to capture the values, and values are changing in every file is entering on the repository (a folder in this case)

Best regards

In that case use ruby and scan

ruby { code => 'event.set("anArray", event.get("message").scan(/someRegexp/))' }

then pick whichever member of the array that you want.

Thank you so much Badger,

could you please provide some simple example? I would like to undertand how this ruby block works.

Best regards

input { generator { count => 1 lines => [ '
Foo: 21
Another Foo: 14
Final Foo: 734591' ] } }

ruby { code => 'event.set("Foos", event.get("message").scan(/Foo: ([0-9]+)/))' }
output { stdout { codec => rubydebug { metadata => false } } }

will produce

      "Foos" => [
    [0] [
        [0] "21"
    ],
    [1] [
        [0] "14"
    ],
    [2] [
        [0] "734591"
    ]

scan finds the three occurrences of Foo: followed by a number. For each occurence it returns an array of capture groups (parentheses inside the regexp). In my case there is only one capture group for each occurrence.

Thank you so much Badger, I really apreciate your support and patience.

Best regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.