How to handle multiple match in logstash filter


(Ankit) #1

I want to know what is the best way to handle multiple logs pattern single file. I have created below filter for the my logs.

filter {
if[type] =='AppLog' {
	grok 
	{
	break_on_match => false 
	match => { "message" => [ "%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} (?<threadName>[^:]+):(?<LineNumber>[^a-z]+) - %{GREEDYDATA:Line}",
	 						    "%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} (?<threadName>[^:]+):(?<LineNumber>[^a-z]+) - %{GREEDYDATA:LoggedMessage}"							
  ] } }
  
  json {
		source => "Line" 
	     }
	mutate
	   { 
		remove_field => [ "Line","LineNumber" ]  
		rename => { "t" => "Log_Timestamp" }
		rename => { "h" => "Hostname" }
		rename => { "l" => "LogLevel" }
		rename => { "cN" => "Class_Name" }
		rename => { "mN" => "Method_Name" }
		rename => { "m" => "Logged_Message" }
		rename => { "ecid" => "ECID" }
		rename => { "d" => "Data" }
		rename => { "eS" => "Exception_Message" }
		rename => { "stacktrace" => "Stacktrace" }
       } 
	

}


if "_grokparsefailure" in [tags] { 
    drop {}
    }
}

Below are two logs statement that I want to parse. One is wrapped in JSON and other is in plain text. As
I have marked break_on_match as false, it goes through both pattern . For log statement , I am getting _jsonparsefailure as it is not in Json and every other field there are two entries like loglevel is having two INFO etc.

2017-08-08 17:34:04:527 INFO Login:? - TGT expires: Wed Aug 09 05:34:04 GMT 2017
2017-08-08 17:34:04:648 INFO ConnectionContainer:? - {"t":1502213644648,"ecid":"Unknown","h":"0ac94b160d0c","l":"INFO","cN":"com.apps.common.connection","mN":"getConnection","m":"Single Keytab Mode"}

Parsed result

{"message":"2017-08-08 17:34:05:947 INFO ClientCnxn:? - EventThread shut down","type":"WidgetRestLog","threadName":["ClientCnxn","ClientCnxn"],"tags":["WidgetRestLog","_jsonparsefailure"],"LoggedMessage":"EventThread shut down","path":"/tmp/aio_widgetrest_1.log","@timestamp":"2017-08-10T16:09:18.067Z","loglevel":["INFO","INFO"],"@version":"1","host":"dev2-dockercomps-1","LogDate":["2017-08-08 17:34:05:947","2017-08-08 17:34:05:947"]}


(Magnus Bäck) #2

What's the question? AFAICT those grok expressions are identical except for the name of the GREEDYDATA field.


(Ankit) #3

I have just edited my question. It was incomplete.


(Magnus Bäck) #4

I have marked break_on_match as false, it goes through both pattern

But that's not what you want to do. You want it to try two patterns and be satisfied with the first match. The first expression could match if the log message looks like JSON and begins and ends with braces, i.e. your two expressions could look like something like this:

(?<LineNumber>[^a-z]+) - (?=\{")%{GREEDYDATA:json}(?<=\})$
(?<LineNumber>[^a-z]+) - %{GREEDYDATA:LoggedMessage}

The (?=...) and (?<=...) are zero-length lookahead and lookbehind assertions. Note the double quote in the expressions; make the whole string wrapped by single quotes instead of double quotes to avoid problems.


(Ankit) #5

Thanks. It is working . I never knew that regular expression has such option. Many things to learn :slightly_smiling_face:

So here is my updated filter

 filter {
if[type] =='WidgetRestLog' {
	grok 
	{
      break_on_match => false 
      match => { "message" => [
                                '%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} (?<threadName>[^:]+):(?<LineNumber>[^a-z]+) - (?=\{")%{GREEDYDATA:json}(?<=\})$',	
	 			    "%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} (?<threadName>[^:]+):(?<LineNumber>[^a-z]+) - %{GREEDYDATA:LoggedMessage}"						
                              ]
               }
    }
  
  json {
		source => "json" 
	     }
	mutate
	   { 
		remove_field => [ "json","LineNumber" ]  
		rename => { "t" => "Log_Timestamp" }
		rename => { "h" => "Hostname" }
		rename => { "l" => "LogLevel" }
		rename => { "cN" => "Class_Name" }
		rename => { "mN" => "Method_Name" }
		rename => { "m" => "LoggedMessage" }
		rename => { "ecid" => "ECID" }
		rename => { "d" => "Data" }
		rename => { "eS" => "Exception_Message" }
		rename => { "stacktrace" => "Stacktrace" }
       } 
}

if "_grokparsefailure" in [tags] { 
    drop {}
    }

}

For below log statement it is working fine except 3 attributes are getting parsed twice

2017-08-10 19:34:06:799 INFO MYClass:? - {"t":1502393646799,"ecid":"Unknown","h":"0ac94b160d0c","l":"INFO","cN":"org.test.common.MYClass","mN":"readFeaturesSet","m":"Enter method"}

LogDate, logLevel and threadName having these

LogDate":["2017-08-10 19:34:06:799","2017-08-10 19:34:06:799"]
loglevel":["INFO","INFO"]
"threadName":["MYClass","MYClass"]

Not sure what is going wrong here ?


(Ankit) #6

My bad. break_on_match should not be there.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.