Config file for multiple multiline patterns


#1

I have installed logstash v5.5.0. I want to parse logs which have multiple multiline formats. I have logs in the following format:

Fri 3/3/17 14:31:30: AbCdEueuq_11111111111.PqrThr-4: TRACE : xx.xxx.xxx Administrator - hold() called from blah: xx.xxx.xxx
Fri 3/3/17 14:32:43: AbCdEueuq_11111111111.PqrThr-4: java.lang.xxx.xxxException
Fri 3/3/17 14:32:43: AbCdEueuq_11111111111.PqrThr-4: 	at sun.reflect.xxx.xxx(Unknown Source)
Fri 3/3/17 14:32:43: AbCdEueuq_11111111111.PqrThr-4: 	at sun.reflect.xxx.xxx(xxx.java:11)
Fri 3/3/17 14:32:43: AbCdEueuq_11111111111.PqrThr-4: Caused by: xx.xxx.xxxx.xxxxException: xxxx.xxxxException: xxxx.xxxxException: A SQL error has occurred. Database system message follows:
Fri 3/3/17 14:32:43: AbCdEueuq_11111111111.PqrThr-4: 	xx.xxxx.xxxxException: A SQL error has occured. Database system message follows:
Fri 3/3/17 14:32:43: AbCdEueuq_11111111111.PqrThr-4: 	java.sql.xxxxxException: Closed Connection
Fri 3/3/17 14:32:55: AbCdEueuq_11111111111.PqrThr-4: Nested exception is: xx.xxx.xxxException: A SQL error has occurred. Database system message follows:
Fri 3/3/17 14:32:55: AbCdEueuq_11111111111.PqrThr-4: 	xx.xxxx.xxxxException: A SQL error has occured. Database system message follows:
Fri 3/3/17 14:32:55: AbCdEueuq_11111111111.PqrThr-4: 	java.sql.xxxxxException: Closed Connection
Fri 3/3/17 14:32:55: AbCdEueuq_11111111111.PqrThr-4: 	at xx.xxxx.xxxx.xx(xx.java:111)
Fri 3/3/17 14:32:55: AbCdEueuq_11111111111.PqrThr-4: 	at wt.xxxx.xxxxx.xxxx(xx.java:222)
Fri 3/3/17 14:32:43: AbCdEueuq_11111111111.PqrThr-4: 	... 8 more

I have my logstash configured as follows:
filter.conf ->

filter {
	grok {
		break_on_match => false
		match => {  "message" => ["%{DAY:Day} %{DATE_EU:Date} %{TIME:time}: (?<ThreadName>[^\s]*): %{LOGLEVEL:loglevel}[\s]+: %{GREEDYDATA:message}"
								 ]
		}
		add_field => { "logtype" => "DEFCom"
					   "ThreadName" => "%{ThreadName}"
		}
	}
}
filter {
	multiline {		
		pattern => "((%{DAY:Day} %{DATE_EU:Date} %{TIME:time}: (?<ThreadName>[^\s]*):[\s]+%{JAVASTACKTRACEPART:JavaStackTracePart})|
			     (%{DAY:Day} %{DATE_EU:Date} %{TIME:time}: (?<ThreadName>[^\s]*):[\s]+Caused by:%{GREEDYDATA:msg})|
			     (%{DAY:Day} %{DATE_EU:Date} %{TIME:time}: (?<ThreadName>[^\s]*):[\s]+Nested exception is:%{GREEDYDATA:msg})|
			     (%{DAY:Day} %{DATE_EU:Date} %{TIME:time}: (?<ThreadName>[^\s]*):[\s]+.*Exception.*)|
		             (%{DAY:Day} %{DATE_EU:Date} %{TIME:time}: (?<ThreadName>[^\s]*):[\s]+... %{POSINT} more)|
			     (%{DAY:Day} %{DATE_EU:Date} %{TIME:time}: (?<ThreadName>[^\s]*):%{SPACE}at%{SPACE}(?<filename>[^\(]*)%{GREEDYDATA:aftermsg}))"
		negate => false
		what => previous
		remove_field => ["Day","Date","time"]
	}
}
filter {
	mutate {
		add_field => { "logtype" => "DEFCom"
		               "ThreadName" => "%{ThreadName}"
		}
	}
}

Question 1: With the above config, I am able to get log lines which display the stacktrace part "... at ..." as multiline messages, but not the other kind of lines, like the "... Caused by ..." or "... Nested exception is ..." . Also, sometimes the multiline messages start with stacktrace part "... at ..." as multiline messages, which should not be the case, since they are a part of a multiline message, right? If I want the message fields in elasticsearch to display this content:

  • Fri 3/3/17 14:31:30: AbCdEueuq_11111111111.PqrThr-4: TRACE : xx.xxx.xxx Administrator - hold() called from blah: xx.xxx.xxx

  • Fri 3/3/17 14:32:43: AbCdEueuq_11111111111.PqrThr-4: java.lang.xxx.xxxException <br> at sun.reflect.xxx.xxx(Unknown Source) at sun.reflect.xxx.xxx(xxx.java:11) Caused by: xx.xxx.xxxx.xxxxException: xxxx.xxxxException: xxxx.xxxxException: A SQL error has occurred. Database system message follows: xx.xxxx.xxxxException: A SQL error has occured. Database system message follows: java.sql.xxxxxException: Closed Connection \n Nested exception is: xx.xxx.xxxException: A SQL error has occurred. Database system message follows: xx.xxxx.xxxxException: A SQL error has occured. Database system message follows: ... 8 more

Then what should be corrected? Is there any other way to achieve this? I would be grateful, if complete filter.conf content could be posted as an answer.

Question 2: I am getting the text "%{ThreadName}" in the ThreadName field for all the entries shown in kibana. I jave purposefully added the last filter block, since otherwise, ThreadName field did not show up in the elasticsearch entries. I want the ThreadName to be displayed on all the messages. Also, all the entries have a "_grokparsefailure" tag. What should be corrected?


(Magnus Bäck) #2

I am getting the text “%{ThreadName}” in the ThreadName field for all the entries shown in kibana.

That means that there is no ThreadName field in the events.

I jave purposefully added the last filter block, since otherwise, ThreadName field did not show up in the elasticsearch entries.

You weren't seeing that field because the events had no such field.

Also, all the entries have a “_grokparsefailure” tag. What should be corrected?

Your grok expression doesn't match the input data.


#3

Thank you for replying.

But I have inserted the field ThreadName by using `(?< ThreadName>[^\s]*)` in the first filter block as well as in the multiline block in the second filter block. Also, I have tested this pattern on [http://grokconstructor.appspot.com/do/match](http://grokconstructor.appspot.com/do/match), by putting the multiline patterns in the space for multiline patterns, and it is matching correctly.

Question 1. So what is it that I am doing wrong? I am unable to figure out. Is there a better way to achieve what I'm trying to? Could you please elaborate?

Question 2. For my use case, where multiline messages have more than one formats, where should the specifying of multiline patterns be done? Should it be done inside logstash input.conf (that is by using the multiline codec) or the filter.conf (by using the multiline filter)?


(Magnus Bäck) #4

So what is it that I am doing wrong? I am unable to figure out. Is there a better way to achieve what I’m trying to? Could you please elaborate?

Build your grok expression gradually. Start with the simplest possible expression, ^%{DAY}, and verify that it works. Continue adding more and more tokens until you're done or things break. In the latter case you then know what caused things to break.

In the spirit of starting simple I suggest you ignore the multiline problem for now and make the grok expression work for regular oneliners first.

For my use case, where multiline messages have more than one formats, where should the specifying of multiline patterns be done?

The multiline filter is deprecated and you should use the multiline codec instead. What kind of input do you use?

I doubt you actually have to specify more than one multiple pattern. The multiline expression doesn't have to match the whole input string. It only needs to be able to distinguish between the first line of a logical message and a continuation line. In most cases you can just use the timestamp as the marker of the first line. In your case this pattern should do:

^%{DAY} %{DATE_EU} %{TIME}: 

You also need negate => true so that the multiline configuration reads "unless the line begins with a timestamp, join the line with the preceding line".


#5

I tried this, and currently I'm able to see all the log lines as separate lines in Elasticsearch through kibana. I can say that it is working correctly, since now there is no _grokparsefailure visible in the tags.
This is my logstash configuration:

input {
  beats {
    port => 5044
  }
}
filter {
  grok {
  match => {	"message" => "^%{DAY:Day}%{SPACE}%{DATE_EU:Date}%{SPACE}%{TIME:time}:%{SPACE}%{NOTSPACE:ThreadName}:%{SPACE}%{GREEDYDATA:msg}"
	   }
  add_field => { "logtype" => "DEFCom"
               }
       }
}
output {
  elasticsearch {
    hosts => "localhost:9200"
    manage_template => false
    index => "%{logtype}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

I'm using Filebeat input

ok. So by negate => true you are saying to count "Those messages which DON'T match the pattern" as part of multiline messages (whether first line of a multiline message or the consucutive lines), right?

Also, just to confirm, please tell me the meaning of this pattern: ^%{DAY} %{DATE_EU} %{TIME}:
Does it mean this: Starting with "<Day><Space><Date_in_EU><Space><Time><Colon>" ?

Is my interpretation uptil now totally correct?

I am not sure if I understand this completely. Please point me to an example, so that it can be clear.
Also, if I want the complete stacktrace part or end msg part of the log line (which is present at the end in all the messages) inside a separate field so as to make it searchable, then I will have to match it by mentioning a pattern, right?

Please tell me what to do for the multiline messages now?


(Magnus Bäck) #6

I’m using Filebeat input

Then that's where you should have your multiline processing.

So by negate => true you are saying to count “Those messages which DON’T match the pattern” as part of multiline messages (whether first line of a multiline message or the consucutive lines), right?

Yes. Perform the action (join with previous or join with next) if the message doesn't match the pattern.

Is my interpretation uptil now totally correct?

Yes.

I am not sure if I understand this completely. Please point me to an example, so that it can be clear.

Okay, I'll try. Say we have this multiline log entry:

2017-08-31 12:00:00 INFO blah blah
blah blah
blah blah

To decide whether these lines are part of a multiline event and, if so, which line is the first line of that multiline event, we don't care about the "blah blah". All we need to check is whether the line begins with a timestamp and a loglevel. Lines that do begin that way start a new event and lines that don't begin that way must be continuation lines.

Also, if I want the complete stacktrace part or end msg part of the log line (which is present at the end in all the messages) inside a separate field so as to make it searchable, then I will have to match it by mentioning a pattern, right?

Sure, but in the grok filter.


#7

ok, I am trying to achieve this:
For all those log lines which do not match these patterns for the msg field, apply the multiline codec using these patterns, and set negate => true . The patterns are as follows:

^(\s+ALERT\s+|\s+TRACE\s+|\s+DEBUG\s+|\s+NOTICE\s+|\s+INFO\s+|\s+WARN?(?:ING)?\s+|\s+ERROR\s+|\s+CRIT?(?:ICAL)?\s+|\s+FATAL\s+|\s+SEVERE\s+|\s+EMERG(?:ENCY)?\s+)
// ^Specifies "<Space><Log_Level><Space>"

^%{SPACE}java.*Throwable$
// ^Specifies Ending with "java<Any_Characters>Throwable"

^%{SPACE}java.*Exception$
// ^Specifies Ending with "java<Any_Characters>Exception"

These are my configurations:

input {
  beats {
  port => 5044
  }
}
filter {
  grok {
	match => {  "message" => "^%{DAY:Day}%{SPACE}%{DATE_EU:Date}%{SPACE}%{TIME:time}:%{SPACE}%{NOTSPACE:ThreadName}:%{SPACE}%{GREEDYDATA:msg}"
	}
	add_field => { "logtype" => "DEFCom"
	}
  }
}
output {
	if [msg] =~ "^(\s+ALERT\s+|\s+TRACE\s+|\s+DEBUG\s+|\s+NOTICE\s+|\s+INFO\s+|\s+WARN?(?:ING)?\s+|\s+ERROR\s+|\s+CRIT?(?:ICAL)?\s+|\s+FATAL\s+|\s+SEVERE\s+|\s+EMERG(?:ENCY)?\s+)"
	{  elasticsearch {
		hosts => "localhost:9200"
		manage_template => false
		index => "%{logtype}-%{+YYYY.MM.dd}"
		document_type => "%{[@metadata][type]}"
		codec => multiline {
		  pattern => "^%{DAY:Day}%{SPACE}%{DATE_EU:Date}%{SPACE}%{TIME:time}:%{SPACE}%{NOTSPACE:ThreadName}:%{SPACE}(\s+ALERT\s+|\s+TRACE\s+|\s+DEBUG\s+|\s+NOTICE\s+|\s+INFO\s+|\s+WARN?(?:ING)?\s+|\s+ERROR\s+|\s+CRIT?(?:ICAL)?\s+|\s+FATAL\s+|\s+SEVERE\s+|\s+EMERG(?:ENCY)?\s+)%{SPACE}"
		  negate => true
		  what => previous
		 }
	   }
	}
	else if [msg] =~ "^%{SPACE}java.*Throwable$"
	{  elasticsearch {
		hosts => "localhost:9200"
		manage_template => false
		index => "%{logtype}-%{+YYYY.MM.dd}"
		document_type => "%{[@metadata][type]}"
		codec => multiline {
		  pattern => "^%{DAY:Day}%{SPACE}%{DATE_EU:Date}%{SPACE}%{TIME:time}:%{SPACE}%{NOTSPACE:ThreadName}:%{SPACE}java.*Throwable$"
		  negate => true
		  what => previous
		 }
	   }
	}
	else if [msg] =~ "^%{SPACE}java.*Exception$"
	{  elasticsearch {
		hosts => "localhost:9200"
		manage_template => false
		index => "%{logtype}-%{+YYYY.MM.dd}"
		document_type => "%{[@metadata][type]}"
		codec => multiline {
		  pattern => "^%{DAY:Day}%{SPACE}%{DATE_EU:Date}%{SPACE}%{TIME:time}:%{SPACE}%{NOTSPACE:ThreadName}:%{SPACE}java.*Exception$"
		  negate => true
		  what => previous
		 }
	   }
	}
	else
	{  elasticsearch {
		hosts => "localhost:9200"
		manage_template => false
		index => "%{logtype}-%{+YYYY.MM.dd}"
		document_type => "%{[@metadata][type]}"
		codec => multiline {
		  pattern => "%{SPACE}at%{SPACE}|%{SPACE}Caused by:%{SPACE}|%{SPACE}Nested exception is:%{SPACE}|java.*Exception:|%{SPACE}...%{SPACE}%{POSINT}%{SPACE}more$"
		  negate => true
		  what => previous
		 }
		}
	}
}

Now, I can see all the lines being indexed in Elasticsearch via Kibana but as single lines, and not as multi line messages. I also tried other configuration, where in I removed the last else block under the output block (according to what you said). But, in that case, I could not figure out why any of the log lines were not showing up in Elasticsearch, but were being read for sure (as was visible in logstash log file at /var/log/logstash-plain.log)

Am I doing something wrong now?


(Magnus Bäck) #8

Do not attempt to use the multiline codec with the elasticsearch output. Put your multiline processing in Filebeat.


#9

okay. I have now put the multiline patterns in filebeat.yml file and they are getting parsed correctly as I want. But now, I want to manipulate the multiline messages coming from filebeat. Specifically, I want to remove the initial repeated part from all multiline messges (as can be seen in the sample log lines above). How do I do that in Logstash?

filter {
 if <message_is_multiline>
  {
    mutate {
    ...
   }
  }
}

What should the condition <message_is_multiline> be ?


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.