Grok for parsing java Log


(san ) #1

Hi,

I'm learning ELK for one of my upcoming implementation for the past few days.
I have created a logstash conf which sends the data to my elasticsearch.
With my logstash conf, I'm expecting a few fields to be added in in the data for kibana to analyse.
But except "message" one others are visible.
Any pointers to solve this issue would be helpful.

input { 
 beats {
   port => 5044
   type => "sky_app_log"
	codec => 
	multiline {
	  charset => "ISO-8859-1"
      pattern => "^%{TIMESTAMP_ISO8601}"
	  max_lines => 1000
      negate => true
      what => "previous"
    } 
  }
}

filter {
mutate {
    gsub => [ "message", "\r", "" ]
  }
    
  grok {
		match => {"message" => "%{TIMESTAMP_ISO8601:timestamp} %{SKYLOGLEVEL:loglevel} %{THREAD:thread} %{RMOTEIP:remoteipaddress} %{JAVACLASS:logclass} %{CUSTOM_TRACE_EXCEPTION:exception} %{CUSTOM_TRACE_CAUSED_BY:causedby} %{GREEDYDATA:details}"}
		match => {"exception" => "%{CUSTOM_TRACE_EXCEPTION:exception}"}
		match => {"thread" => "%{THREAD:thread}"}	
		match => {"loglevel" => "%{ACMELOGLEVEL:loglevel}"}
		match => {"logclass" => "%{JAVACLASS:logclass}"}
		match => {"remoteip" => "%{RMOTEIP:remoteipaddress}"}
		break_on_match => false	
   }
			date {
					match => [ "timestamp", "yyyy-MM-dd HH:mm:ss,SSS" ]
					remove_field => [ "timestamp" ]
 			}

 
}
 
output {
     elasticsearch { 
 				hosts => "localhost:9200"
 				}
	 file {
               path => "C:\logs\output9.txt"
           }
}

(Magnus Bäck) #2

Without some example logs it's impossible to help.

		match => {"message" => "%{TIMESTAMP_ISO8601:timestamp} %{SKYLOGLEVEL:loglevel} %{THREAD:thread} %{RMOTEIP:remoteipaddress} %{JAVACLASS:logclass} %{CUSTOM_TRACE_EXCEPTION:exception} %{CUSTOM_TRACE_CAUSED_BY:causedby} %{GREEDYDATA:details}"}

You're using non-default patterns like SKYLOGLEVEL, RMOTEIP, CUSTOM_TRACE_EXCEPTION, and CUSTOM_TRACE_CAUSED_BY. Where are those defined? Have you modifed the original pattern files that come with the logstash-patterns-core plugin?

		match => {"thread" => "%{THREAD:thread}"}	
		match => {"loglevel" => "%{ACMELOGLEVEL:loglevel}"}
		match => {"logclass" => "%{JAVACLASS:logclass}"}
		match => {"remoteip" => "%{RMOTEIP:remoteipaddress}"}

These serve no purpose. It seems you're misunderstanding how the grok filter works.


(san ) #3

@magnusbaeck Yes you are right, I'm still learning Grok for my log file and is confusing me lot. The patterns I mentioned here are derived from the standard grok patterns and they are available in my patterns folder. And log stash is happy with these.
Please find an excerpt from my log file:

2013-04-05 00:00:02,101 ERROR [scheduler_Worker-6          ]                 (DataProcessor.java:412 ) RemoteException > 
    AxisFault
     faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server
     faultSubcode: 
     faultString: 0005: No Data matched the criteria Specified
     faultActor: 
     faultNode: 
     faultDetail: 
    	{http://www.bea.com/wli/sb/context}fault:<con:errorCode>0005</con:errorCode><con:reason>No Data     matched the criteria Specified</con:reason><con:location><con:node>getNumber</con:node>   <con:pipeline>getNumber_response</con:pipeline><con:stage>Create Number Response</con:stage>   <con:path>response-pipeline</con:path></con:location>

0005: No Data matched the criteria Specified1
	at org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)
	at org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:129)
	at org.apache.axis.encoding.DeserializationContext.endElement(DeserializationContext.java:1087)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)

com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
2013-04-05 00:07:36,535 INFO [TP-Processor8 ] 10.136.59.190 ( WTSDK.java:504 ) WTSDK-

Excerpt from the output generated by logstash

{"@timestamp":"2016-03-07T23:59:47.306Z","message":"2013-04-05 00:00:02,101 ERROR [scheduler_Worker-6          ]                 (DataProcessor.java:412 ) RemoteException > \nAxisFault\n faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server\n faultSubcode: \n faultString: 0005: No Data matched the criteria Specified\n faultActor: \n faultNode: \n faultDetail: \n\t{http://www.bea.com/wli/sb/context}fault:<con:errorCode>0005</con:errorCode><con:reason>No Data matched the criteria Specified</con:reason><con:location><con:node>GetNumber</con:node><con:pipeline>GetNumber_response</con:pipeline><con:stage>Create Get Trips By Flight Number Response</con:stage><con:path>response-pipeline</con:path></con:location>\n0005: No Data matched the criteria Specified1\n\tat org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)\n\tat org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:129)\n\tat org.apache.axis.encoding.DeserializationContext.endElement(DeserializationContext.java:1087)\n\tat .............

","@version":"1","tags":["multiline","beats_input_codec_multiline_applied"],"beat":{"hostname":"LVRJ8YRJX1","name":"LVRJ8YRJX1"},"count":1,"fields":null,"input_type":"log","offset":3744,"source":"C:\logs\bagassist_x - Copy.log","type":"log","host":"LVRJ8YRJX1"}

Logging pattern

<pattern>%d %-5level [%-28thread] [%-15X{remoteIpAddress}] (%35logger{0}:%-3L\) %message%n</pattern>   

logstash.conf - changed my logstash.conf to

input{
    beats{
		port=>5044
		type=>"sky_app_log"
		codec=>
		 multiline{
			charset=>"ISO-8859-1"
 			pattern=>"^%{TIMESTAMP_ISO8601}"
			max_lines=>1000
 			negate=>true
			what=>"previous"
		}
	}
}filter{
	mutate{
		gsub=>["message","\r",""]
	}grok{
 		patterns_dir=>"./patterns"
 		match => { message => ["%{TIMESTAMP_ISO8601:timestamp}","%{LOGLEVEL1:loglevel}","%{SKYEXCEPTION_TYPE:exception}"]}
	}date{
		match=>["timestamp","yyyy-MM-dd HH:mm:ss,SSS"]
	    remove_field=>["timestamp"]
	}
}output{
	elasticsearch{
 		hosts=>"localhost:9200"
	}file{
		path=>"C:\logs\output9.txt"
	}
}

Patterns

LOGLEVEL1 (ALERT|TRACE|DEBUG|[Nn]otice|NOTICE|INFO|WARN?(?:ING)?|ERROR|CRIT?(?:ICAL)?|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
SKYEXCEPTION_TYPE (?i)\b\w*\QException\E\w*\b

Kibana

I'm xepecting to build some visualizations based on "LOGLEVEL1" and "SKYEXCEPTION_TYPE " for kibana dashboard. But none of these fields are available there.
Please provide me with some pointers to solve this.

Thanks and Regards,
San


(Magnus Bäck) #4
match => { message => ["%{TIMESTAMP_ISO8601:timestamp}","%{LOGLEVEL1:loglevel}","%{SKYEXCEPTION_TYPE:exception}"]}

You don't want to use multiple expressions here. Write a single expression, something like this perhaps:

%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL1:loglevel}\s+\[(?<threadname>[^ \]]+)\s\]\s+(%{IP:ip}\s+)?...

Build up your expression piece by piece. That way you'll see right away when its last part doesn't work.


(san ) #5

@magnusbaeck Thanks a lot, I'm about give a try now.


(san ) #6

What does this error means

    target of repeat operator is not specified: /(?<TIMESTAMP_ISO8601:timestamp>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?)\s+(?<LOGLEVEL1:loglevel>(ALERT|TRACE|DEBUG|[Nn]otice|NOTICE|INFO|WARN?(?:ING)?|ERROR|CRIT?(?:ICAL)?|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?))\s+(?<THREAD:thread>?<=\[)(?=[^\]\[]*(?:-|@))\s*\b(?:[0-9A-Za-z][0-9A-Za-z-_.#@\s]{0,200})(?:\.(?:[0-9*A-Za-z][0-9A-Za-z-_.#@\s]{0,200}))*(\.?|\b)\s*(?=\]))/m

This happens when I'm adding "THREAD:thread" to the expression. I have tested this in some regex tool aganist my log and it was returning the threadname.

I'm not sure what I'm missing here( I'm trying to extract the value between [ ].

Thanks,
Santhosh


(san ) #7

@magnusbaeck I have a few layman question about Grok expressions

  1. "while building complex grok expressions with multiple patterns, if any one of them fails for some reason (Eg: a match not found, say for example "%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL1:loglevel}\s+" if log level fails then) I assume that the current line which is groked will be ignored, I'm I correct? I got this feeling while testing my patterns in grok debugger.

Thanks and Regards,
Santhosh


Making a part in the grok expression optional
(Magnus Bäck) #8
target of repeat operator is not specified: [...] (?<THREAD:thread>?<=\[)(?=[^\]\[]*(?:-|@))

I suspect the error message means that for one of the * or + operators there's nothing preceding it, which is a problem since those operators act on what comes before them. I don't understand your expression so I can't offer more help. I don't know why the expression I presented was insufficient (unless you have square brackets in your thread names).

"while building complex grok expressions with multiple patterns, if any one of them fails for some reason (Eg: a match not found, say for example "%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL1:loglevel}\s+" if log level fails then) I assume that the current line which is groked will be ignored, I'm I correct? I got this feeling while testing my patterns in grok debugger.

Depends on what you mean by "ignore a line". The event won't be dropped, but the grok filter will fail and not extract any fields. The whole expression needs to match for any fields to be extracted.


(Magnus Bäck) #10

@Kiranmai_Reddy, please start your own thread for your unrelated question instead of resurrecting old unrelated threads.


(system) #11