Grok Pattern works on grok debugger but _GROKPARSE failure in Logstash


(Makara) #1

Hi
I am trying to parse a multi-line log file(CSV) using Logstash/grok pattern. The pattern works inside grok debugger but when i run it in the logstash, it produces _grokparsefailuer. The _grokparsefailuer is there when i remove the CSV header as well.

The log line is bit long, I am pasting only a single line here with the CSV header.

Logfile
---------

PartitionKey,RowKey,Timestamp,Component,Component@type,Created,Created@type,CustomerName,CustomerName@type,ErrorMessage,ErrorMessage@type
1,09cca1e3-5e85-4d58-adca-11f8820ba70e,2018-02-09T16:42:08.775Z,TT.Core.Api,Edm.String,2018-02-08T11:18:22.631Z,Edm.DateTime,BTPL,Edm.String,"System.InvalidOperationException: Source sequence contains more than one element.
   at System.Linq.AsyncEnumerable.<SingleOrDefault_>d__381`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---       
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at Microsoft.EntityFrameworkCore.Query.Internal.AsyncLinqOperatorProvider.TaskResultAsyncEnumerable`1.Enumerator.<MoveNext>d__3.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at Microsoft.EntityFrameworkCore.Query.Internal.AsyncLinqOperatorProvider.ExceptionInterceptor`1.EnumeratorExceptionInterceptor.<MoveNext>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at 
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.<InvokeNextExceptionFilterAsync>d__23.MoveNext()",Edm.String

The logstash config file

input {
        file {
		    path => "/home/data/apierror.csv"
	    	    start_position => "beginning"
		    sincedb_path => "/tmp/sincedb"                                                  
  	}
}

filter {

         grok { 

               match=>["message","%{INT:PartitionKey}\,%{DATA:RowKey}\,%{TIMESTAMP_ISO8601:Timestamp}\,%{DATA:Component}\,%{DATA:ComponentType}\,%{TIMESTAMP_ISO8601:Created}\,%{DATA:CreatedType}\,%{DATA:CustomerName}\,%{DATA:CustomerNameType}\,%{QUOTEDSTRING:ErrorMessage}\,%{GREEDYDATA:ErrorMessageType}"] 

}
}

output{
       stdout { codec => "rubydebug" }     
}

I can see from the debug line grok compiled is OK.

[2018-02-19T04:38:06,550][DEBUG][logstash.filters.grok ] Grok compiled OK {:pattern=>"%{INT:PartitionKey}\\,%{DATA:RowKey}\\,%{TIMESTAMP_ISO8601:Timestamp}\\,%{DATA:Component}\\,%{DATA:ComponentType}\\,%{TIMESTAMP_ISO8601:Created}\\,%{DATA:CreatedType}\\,%{DATA:CustomerName}\\,%{DATA:CustomerNameType}\\,%{QUOTEDSTRING:ErrorMessage}\\,%{GREEDYDATA:ErrorMessageType}", :expanded_pattern=>"(?<INT:PartitionKey>(?:[+-]?(?:[0-9]+)))\\,(?<DATA:RowKey>.*?)\\,(?<TIMESTAMP_ISO8601:Timestamp>(?:(?>\\d\\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?)\\,(?<DATA:Component>.*?)\\,(?<DATA:ComponentType>.*?)\\,(?<TIMESTAMP_ISO8601:Created>(?:(?>\\d\\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?)\\,(?<DATA:CreatedType>.*?)\\,(?<DATA:CustomerName>.*?)\\,(?<DATA:CustomerNameType>.*?)\\,(?<QUOTEDSTRING:ErrorMessage>(?>(?<!\\\\)(?>\"(?>\\\\.|[^\\\\\"]+)+\"|\"\"|(?>'(?>\\\\.|[^\\\\']+)+')|''|(?>(?>\\.|[^\\]+)+)|``)))\,(?GREEDYDATA:ErrorMessageType.*)"}`

and thereafter in the logfile, i can see the following lines for the last field(ErrorMessage@type) which is a multiline.

[2018-02-19T04:38:07,591][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"1,09cca1e3-5e85-4d58-adca-11f8820ba70e,2018-02-09T16:42:08.775Z,TT.Core.Api,Edm.String,2018-02-08T11:18:22.631Z,Edm.DateTime,BTPL,Edm.String,\"System.InvalidOperationException: Source sequence contains more than one element.\r"}
[2018-02-19T04:38:07,950][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-02-19T04:38:08,005][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"   at System.Linq.AsyncEnumerable.<SingleOrDefault_>d__381`1.MoveNext()\r"}
[2018-02-19T04:38:08,015][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"--- End of stack trace from previous location where exception was thrown ---\r"}
[2018-02-19T04:38:08,021][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r"}
[2018-02-19T04:38:08,027][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r"}
[2018-02-19T04:38:08,035][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r"}
[2018-02-19T04:38:08,041][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()\r"}
[2018-02-19T04:38:08,051][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"   at Microsoft.EntityFrameworkCore.Query.Internal.AsyncLinqOperatorProvider.TaskResultAsyncEnumerable`1.Enumerator.<MoveNext>d__3.MoveNext()\r"}
[2018-02-19T04:38:08,061][DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/data/apierror.csv", :text=>"--- End of stack trace from previous location where exception was thrown ---\r"}

I am attaching the logstash (debug) log file here.

Can anyone plz. point me where am i doing wrong ?
Regards


(Fernando Ramos-Morales) #2

Hej:

Looking at the line and at your pattern. The "QUOTEDSTRING" I guess it is looking for a Quote (") at beginning and one at the end but it your case it does not seem to have one at the end. Also it would never arrive to the GREEDYDATA since there are no more commas.

In the Line GROK you have these:
%{INT:PartitionKey}\,
%{DATA:RowKey}\,
%{TIMESTAMP_ISO8601:Timestamp}\,
%{DATA:Component}\,
%{DATA:ComponentType}\,
%{TIMESTAMP_ISO8601:Created}\,
%{DATA:CreatedType}\,
%{DATA:CustomerName}\,
%{DATA:CustomerNameType}\,
%{QUOTEDSTRING:ErrorMessage}\,
%{GREEDYDATA:ErrorMessageType}

All separated by a comma. But your logline looks like this:
1,09cca1e3-5e85-4d58-adca-11f8820ba70e,2018-02-09T16:42:08.775Z,TT.Core.Api,Edm.String,2018-02-08T11:18:22.631Z,Edm.DateTime,BTPL,Edm.String,"System.InvalidOperationException: Source sequence contains more than one element.\r

Which in turn it would be:
INT:PartitionKey, => 1,
DATA:RowKey, => 09cca1e3-5e85-4d58-adca-11f8820ba70e,
TIMESTAMP_ISO8601:Timestamp, => 2018-02-09T16:42:08.775Z,
DATA:Component, => TT.Core.Api,
DATA:ComponentType, => Edm.String,
TIMESTAMP_ISO8601:Created, => 2018-02-08T11:18:22.631Z,
DATA:CreatedType, => Edm.DateTime,
DATA:CustomerName, => BTPL,
DATA:CustomerNameType, => Edm.String,
QUOTEDSTRING:ErrorMessage, => "System.InvalidOperationException: Source sequence contains more than one element.\r
GREEDYDATA:ErrorMessageType =>

So you would need to change the GROK pattern to match maybe use this character ":" as a separator between the two last fields and not use QUOTEDSTRING unless you are sure a second quote will be there. You could also have more than one pattern one after the other two match more than one template.

Best regards,
s.r./F


(Makara) #3

Hi
@emwferm Thanks for the response !

I have checked the grok pattern again/ logs again. The issue was with last two column which spans across multiline so logstash unable to find the end of the pattern. So i have added multiline codec in the input section.

 codec => multiline {
                                                   pattern => "(^\s*)|(^\-*)"
                                                   what => "previous"
                                                   #negate => "true"
                                                   max_lines => 9000
                            }

and it now correctly parse the first CSV record. But then it puts entire log file into the errormessage fields possibly due to unable to find the sentinel for the last field i.e for ErrorMessage@type and does not parse any more record.

The log file
https://imagebin.ca/v/3sGffhirRXmE
The ErrorMessage
https://imagebin.ca/v/3sGh8FP35yv6

Regards


(Makara) #4

Hi
I had to modify the multiline codec section to make it work but still not all CSV record get parsed, out of 50 records 35 nos get parsed. When i decrease the max_lines, more no of events from CSV files get parsed.

input {
         file {
                 path => "/home/data/api.csv"
                 start_position => "beginning"
                 sincedb_path => "/tmp/sincedb"
                 type => "api"
                 sincedb_write_interval => 1
 
                 codec => multiline {
                                       pattern => "(^\s*)|(^\-*)"
                                       what => "previous"
                                       negate => "false"
                                       max_lines => 80
                                       auto_flush_interval => '1'
 
                 }
 
         }
 }

There was no issues with grok, What i had modified in grok is that i changed the GREEDYDATA type to DATA only.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.