_grokparsefailure using regex filter with multiline codec

Hi all, i am trying to parse in Logstash 6.2.2 this log:

--- BEGIN ----- 4.3.2018 13:29:26.286 --- 
	Identification: ABC_a1262e23-74a3-4f57-bd79-ee025805bee5 (c0cc8fa7-d8ad-4cda-8a27-c1af03794ca0)
	Message: Get signed data object ended successfuly.
	Identification: ABC_a1262e23-74a3-4f57-bd79-ee025805bee5 (c0cc8fa7-d8ad-4cda-8a27-c1af03794ca0)
--- END ----- 4.3.2018 13:29:26.286 (2,4811 ms) --- 

--- BEGIN ----- 4.3.2018 13:29:26.302 --- 
	Identification: ABC_a1262e23-74a3-4f57-bd79-ee025805bee5 (c0cc8fa7-d8ad-4cda-8a27-c1af03794ca0)
	Message: Get signed data object.
	Identification: ABC_a1262e23-74a3-4f57-bd79-ee025805bee5 (c0cc8fa7-d8ad-4cda-8a27-c1af03794ca0)
--- END ----- 4.3.2018 13:29:26.302 (2,4565 ms) --- 

--- BEGIN ----- 4.3.2018 13:29:26.302 --- 
	Identification: ABC_a1262e23-74a3-4f57-bd79-ee025805bee5 (c0cc8fa7-d8ad-4cda-8a27-c1af03794ca0)
	Message: Get signed data object.
	Identification: ABC_a1262e23-74a3-4f57-bd79-ee025805bee5 (c0cc8fa7-d8ad-4cda-8a27-c1af03794ca0)
--- END ----- 4.3.2018 13:29:26.302 (2,4565 ms) --- 

--- BEGIN ----- 4.3.2018 13:28:09.330 --- 
	Identification: ABC_36db79f6-dc39-4df3-82f3-297d72316bb2 (c846e462-e755-4aae-87cf-89e556a355c2)
	Message: Initialization started.
	Message: Initializing config reader. 95,4582 ms
	Message: Initializing error log. 0,0997 ms
	Message: Initializing plugins: 6,1417 ms - 2 item(s) - msg from LoadPlugins: 
	LoadPlugins
		Search files "XYZ.*.dll": 0,2736 ms - 2 file(s).
		CurrentDomainAssemblies: 0,1078 ms - count 156.
		Processing file: XYZ.Pdf.dll.
				CheckToken for PdfPlugin: 0,0228 ms.
		Processing file end: 1,6042 ms.
		Processing file: XYZ.Png.dll.
				CheckToken for PngPlugin: 0,0213 ms.
		Processing file end: 1,2178 ms.
		GetPlugins
			Foreach gettypes: 0,0313 ms.
			Foreach types: 0,2379 ms.
		GetPlugins end: 0,2882 ms.
	LoadPlugins end: 6,1281 ms.
	Message: Create xml document: 5,6369 ms
	Message: Create namespace manager: 0,0076 ms
	Message: Check signature version: 0,0203 ms
	Message: Verify signature schema 6,367 ms - removed node count: 0 item(s).
	Message: validateSignaturePolicy: ASDSDFGDSFGSDFGDSFG
	Message: Initialization ended successfuly.
	Identification: ABC_36db79f6-dc39-4df3-82f3-297d72316bb2 (c846e462-e755-4aae-87cf-89e556a355c2)
--- END ----- 4.3.2018 13:28:09.642 (321,7862 ms) --- 

In my conf i am at begining trying to parse start datetime - --- BEGIN ----- 4.3.2018 13:28:09.330 ---

My conf is:

input {
    file {
        path => ["file.log"]
        start_position => "beginning"
        sincedb_path => "/dev/null"
        codec => multiline {
            pattern => "^--- BEGIN"
            negate => true
            what => "previous"
            auto_flush_interval => 30
        }
    }
}

filter {
    grok {
          match => [ "message", "(?<start>(?<=--- BEGIN ----- ).*(?= ---))" ]
    }
}


output {
    stdout { codec => rubydebug }
    elasticsearch {
        hosts => ["localhost:9200"]
        action => "index"
        index => "log_index"
    }
    stdout { }
}

Parsing multiline events working fine but parsing start is returning _grokparsefailure, even my grok is working in grokdebug https://grokdebug.herokuapp.com/. Interesting is that this grok is working fine

grok {
    match => [ "message", "(?<start>(?<=--- BEGIN ----- ).{21})" ]
}

but my start string is not having fixed length.

Do you have any idea where is the problem?

Regards
Martin

No, but isn't

match => [ "message", "--- BEGIN ----- (?<start>[ 0-9:\.]+) " ]

a simpler (and cheaper) solution?

It works fine and I find your solution as better then mine.
Thank you very much.

"--- BEGIN ----- (?<start>.*) ---"     
"--- END ----- (?<end>.*) \("
"--- END ----- (?<end>.*) \((?<duration>.*) ms\)"
"Identification: (?<identification1>ABC_.*) \((?<identification2>.*)\)"

Start is not working, end separately works fine, end in combination with duration is OK as well but duration not. identification1 and identification2 is not working.

All of them work in grokdebug. It looks like logstash has problem with .* in some cases, and i think it is bug.

So I am using this but i think it is work around:

"--- BEGIN ----- (?<start>[ 0-9:\.]+) ---"
"--- END ----- (?<end>[ 0-9:\.]+) \((?<duration>[0-9,]+) ms\)"
"Identification: (?<identification1>ABC_[a-z0-9-]+) \((?<identification2>[a-z0-9-]+)\)"

I have another question, how can I get array messages with all values after "Meaasage: " till end of line as array items.

Thank you very much
Martin

Interesting problem!

filter {
  grok {
    # Get every occurrence of tab followed by Message: until a line that has tab followed by not-M
    match => [ "message", "(?<messages> Message:.*
)       [^M]" ]
  }
  mutate {
    gsub => [
       # Remove the tab+Message:
       "messages", "    Message: ", "",
       # Delete tab, then one or more not newline followed by a newline
       "messages", "    [^
]+
", "" ]
  }
  mutate {
    # Split on newline
    split => { "messages" => "
" }
  }
}

Will get you

      "messages" => [
        [0] "Initialization started.",
        [1] "Initializing config reader. 95,4582 ms",
        [2] "Initializing error log. 0,0997 ms",
        [3] "Initializing plugins: 6,1417 ms - 2 item(s) - msg from LoadPlugins: ",
        [4] "Create xml document: 5,6369 ms",
        [5] "Create namespace manager: 0,0076 ms",
        [6] "Check signature version: 0,0203 ms",
        [7] "Verify signature schema 6,367 ms - removed node count: 0 item(s).",
        [8] "validateSignaturePolicy: ASDSDFGDSFGSDFGDSFG",
        [9] "Initialization ended successfuly."
    ]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.