I want to know what is the best way to handle multiple logs pattern single file. I have created below filter for the my logs.
filter {
if[type] =='AppLog' {
grok
{
break_on_match => false
match => { "message" => [ "%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} (?<threadName>[^:]+):(?<LineNumber>[^a-z]+) - %{GREEDYDATA:Line}",
"%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} (?<threadName>[^:]+):(?<LineNumber>[^a-z]+) - %{GREEDYDATA:LoggedMessage}"
] } }
json {
source => "Line"
}
mutate
{
remove_field => [ "Line","LineNumber" ]
rename => { "t" => "Log_Timestamp" }
rename => { "h" => "Hostname" }
rename => { "l" => "LogLevel" }
rename => { "cN" => "Class_Name" }
rename => { "mN" => "Method_Name" }
rename => { "m" => "Logged_Message" }
rename => { "ecid" => "ECID" }
rename => { "d" => "Data" }
rename => { "eS" => "Exception_Message" }
rename => { "stacktrace" => "Stacktrace" }
}
}
if "_grokparsefailure" in [tags] {
drop {}
}
}
Below are two logs statement that I want to parse. One is wrapped in JSON and other is in plain text. As
I have marked break_on_match as false, it goes through both pattern . For log statement , I am getting _jsonparsefailure as it is not in Json and every other field there are two entries like loglevel is having two INFO etc.
2017-08-08 17:34:04:527 INFO Login:? - TGT expires: Wed Aug 09 05:34:04 GMT 2017
2017-08-08 17:34:04:648 INFO ConnectionContainer:? - {"t":1502213644648,"ecid":"Unknown","h":"0ac94b160d0c","l":"INFO","cN":"com.apps.common.connection","mN":"getConnection","m":"Single Keytab Mode"}
Parsed result
{"message":"2017-08-08 17:34:05:947 INFO ClientCnxn:? - EventThread shut down","type":"WidgetRestLog","threadName":["ClientCnxn","ClientCnxn"],"tags":["WidgetRestLog","_jsonparsefailure"],"LoggedMessage":"EventThread shut down","path":"/tmp/aio_widgetrest_1.log","@timestamp":"2017-08-10T16:09:18.067Z","loglevel":["INFO","INFO"],"@version":"1","host":"dev2-dockercomps-1","LogDate":["2017-08-08 17:34:05:947","2017-08-08 17:34:05:947"]}
What's the question? AFAICT those grok expressions are identical except for the name of the GREEDYDATA field.
I have just edited my question. It was incomplete.
I have marked break_on_match as false, it goes through both pattern
But that's not what you want to do. You want it to try two patterns and be satisfied with the first match. The first expression could match if the log message looks like JSON and begins and ends with braces, i.e. your two expressions could look like something like this:
(?<LineNumber>[^a-z]+) - (?=\{")%{GREEDYDATA:json}(?<=\})$
(?<LineNumber>[^a-z]+) - %{GREEDYDATA:LoggedMessage}
The (?=...) and (?<=...) are zero-length lookahead and lookbehind assertions. Note the double quote in the expressions; make the whole string wrapped by single quotes instead of double quotes to avoid problems.
Thanks. It is working . I never knew that regular expression has such option. Many things to learn 
So here is my updated filter
filter {
if[type] =='WidgetRestLog' {
grok
{
break_on_match => false
match => { "message" => [
'%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} (?<threadName>[^:]+):(?<LineNumber>[^a-z]+) - (?=\{")%{GREEDYDATA:json}(?<=\})$',
"%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} (?<threadName>[^:]+):(?<LineNumber>[^a-z]+) - %{GREEDYDATA:LoggedMessage}"
]
}
}
json {
source => "json"
}
mutate
{
remove_field => [ "json","LineNumber" ]
rename => { "t" => "Log_Timestamp" }
rename => { "h" => "Hostname" }
rename => { "l" => "LogLevel" }
rename => { "cN" => "Class_Name" }
rename => { "mN" => "Method_Name" }
rename => { "m" => "LoggedMessage" }
rename => { "ecid" => "ECID" }
rename => { "d" => "Data" }
rename => { "eS" => "Exception_Message" }
rename => { "stacktrace" => "Stacktrace" }
}
}
if "_grokparsefailure" in [tags] {
drop {}
}
}
For below log statement it is working fine except 3 attributes are getting parsed twice
2017-08-10 19:34:06:799 INFO MYClass:? - {"t":1502393646799,"ecid":"Unknown","h":"0ac94b160d0c","l":"INFO","cN":"org.test.common.MYClass","mN":"readFeaturesSet","m":"Enter method"}
LogDate, logLevel and threadName having these
LogDate":["2017-08-10 19:34:06:799","2017-08-10 19:34:06:799"]
loglevel":["INFO","INFO"]
"threadName":["MYClass","MYClass"]
Not sure what is going wrong here ?
My bad. break_on_match should not be there.