Help on creating GROK patterns


(Mulligan) #1

Hi I am kind of new to ELK and also GROK pattern , below is the log file line for which i want to create a GROK pattern,

The fields that i need are before the '=' as parsed by splunk
[{applicationSystemCode=appname, clientIP=10.x.x.x, clusterId=Cluster-Id-NA, containerId=Container-Id-NA, correlationId=536bacc1-1b50-3866-5c8c-8d0efa037f8f, domainName=defaultDomain, hostName=ip-x-x-x.domain.com, messageId=10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd, userId=ANONYMOUS, webAnalyticsCorrelationId=66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0}]: class com.provider.base.v1.HomeBaseApiConsumer.searchTasks execution time: 15 ms

i also want to capture the execution time which is displayed in the last i.e 15 ms in this example.

I came up with this GROK pattern which is obviously not working.

%{MONTHDAY} %{MONTH} %{YEAR} %{TIME},%{NUMBER:duration} %{WORD:loglevel} %{WORD:Activity} [{%{("applicationSystemCode"= \w)}


(Mulligan) #2

As per the document for Custom patterns , mentioned is (?<field_name>the pattern here)

My updated GROK pattern is

%{MONTHDAY} %{MONTH} %{YEAR} %{TIME},%{NUMBER:duration} %{WORD:loglevel} %{WORD:Activity} [{(?) \W\w+\W\w+) , i tested the regex in regex101.com and it works but in grok Debugger it doesnt work.

Any body can help?


(Kofi) #3

I don't understand where "monthday" or "year" is in your log file. The first field is "applicationSystemCode=appname" right?


(Mulligan) #4

Sorry Seanziee,

The logs start with the date time stamp , the whole log line is like below

01 Aug 2017 17:58:19,048 INFO ProfileAspect [{applicationSystemCode=appname, clientIP=10.x.x.x, clusterId=Cluster-Id-NA, containerId=Container-Id-NA, correlationId=536bacc1-1b50-3866-5c8c-8d0efa037f8f, domainName=defaultDomain, hostName=ip-x-x-x.domain.com, messageId=10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd, userId=ANONYMOUS, webAnalyticsCorrelationId=66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0}]: class com.provider.base.v1.HomeBaseApiConsumer.searchTasks execution time: 15 ms

As you can see in the above line, i need to capture all the Field,value pair separated by = sign. Also there is an execution time at last which i need to capture.


(Mulligan) #5

Still looking for some help.


(Brandon Hatch) #6

Using your log example I would try something like this.

%{MONTHDAY} %{MONTH} %{YEAR} %{TIME},%{NUMBER:duration} %{WORD:loglevel} %{WORD:Activity} \[\{%{DATA:foo1}\}\]:(.*) execution time: %{NUMBER:executionTime} ms

Using that I get this result in the grok tester website.

Once you have your key value pairs into it's own field you can use the KV filter to dynamically split them up even more so. Something like this:

kv {
   source => "foo1"
   field_split => ","
   prefix => "foo_"
}

(Mulligan) #7

Thanks for the help, is there any way i can test this field splitting using FOO1 before putting into logstash?


(Brandon Hatch) #8

It got a bit more complicated with the 5.X versions of logstash, but I often test my filters using the stdin input and running logstash not as a service.

First create a configuration like below:

input {
   stdin {
      codec => json_lines
   }
}
filter {
   grok {
      match => {"message" => "%{MONTHDAY} %{MONTH} %{YEAR} %{TIME},%{NUMBER:duration} %{WORD:loglevel} %{WORD:Activity} \[\{%{DATA:foo1}\}\]:(.*) execution time: %{NUMBER:executionTime} ms"}
   }
   kv {
      source => "foo1"
      field_split => ","
      prefix => "foo_"
   }
}
output {
   stdout { codec => rubydebug}
}

Then run a standalone instance of logstash with your configuration. (Logstash 5.X requires you to also point to a settings file, so this exact phrase may not work anymore)
sudo /opt/logstash/bin/logstash -f ./logstash_groktest.conf

Take your message and turn it into JSON. I just put yours into to {"message":"your log message text here"}

Paste the JSON into your now running logstash instance.

Now look at what it spits out.

{
	"message" => "01 Aug 2017 17:58:19,048 INFO ProfileAspect [{applicationSystemCode=appname, clientIP=10.x.x.x, clusterId=Cluster-Id-NA, containerId=Container-Id-NA, correlationId=536bacc1-1b50-3866-5c8c-8d0efa037f8f, domainName=defaultDomain, hostName=ip-x-x-x.domain.com, messageId=10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd, userId=ANONYMOUS, webAnalyticsCorrelationId=66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0}]: class com.provider.base.v1.HomeBaseApiConsumer.searchTasks execution time: 15 ms",
	"@version" => "1",
	"@timestamp" => "2017-08-02T19:16:53.996Z",
	"host" => "SLCLOGSTASH01",
	"duration" => "048",
	"loglevel" => "INFO",
	"Activity" => "ProfileAspect",
	"foo1" => "applicationSystemCode=appname, clientIP=10.x.x.x, clusterId=Cluster-Id-NA, containerId=Container-Id-NA, correlationId=536bacc1-1b50-3866-5c8c-8d0efa037f8f, domainName=defaultDomain, hostName=ip-x-x-x.domain.com, messageId=10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd, userId=ANONYMOUS, webAnalyticsCorrelationId=66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0",
	"executionTime" => "15",
	"foo_applicationSystemCode" => "appname",
	"foo_ clientIP" => "10.x.x.x",
	"foo_ clusterId" => "Cluster-Id-NA",
	"foo_ containerId" => "Container-Id-NA",
	"foo_ correlationId" => "536bacc1-1b50-3866-5c8c-8d0efa037f8f",
	"foo_ domainName" => "defaultDomain",
	"foo_ hostName" => "ip-x-x-x.domain.com",
	"foo_ messageId" => "10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd",
	"foo_ userId" => "ANONYMOUS",
	"foo_ webAnalyticsCorrelationId" => "66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0"
}

It looks like there is a space after each comma in your key value pairs. You would probably want to use the gsub mutate filter to get rid of those prior to putting it into KV. Using ", " instead of "," may also work.


(Mulligan) #10

Thanks Brandon,

when i run the test i get the grokparsefailure

{
"@timestamp" => 2017-08-02T20:04:30.293Z,
"@version" => "1",
"host" => "ip-10-x-x-x.domain.com",
"message" => "{.message.:."01 Aug 2017 17:58:19,048 INFO ProfileAspect [{applicationSystemCode=appname, clientIP=10.x.x.x, clusterId=Cluster-Id-NA, containerId=Container-Id-NA, correlationId=536bacc1-1b50-3866-5c8c-8d0efa037f8f, domainName=defaultDomain, hostName=ip-x-x-x.domain.com, messageId=10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd, userId=ANONYMOUS, webAnalyticsCorrelationId=66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0}]: class com.provider.base.v1.HomeBaseApiConsumer.searchTasks execution time: 15 ms",
"tags" => [
[0] "_jsonparsefailure",
[1] "_grokparsefailure"
]
}


(Mulligan) #11

here is my logstash config file

input {
stdin {
codec => json_lines
}
}
filter {
grok {
match => {"message" => "%{MONTHDAY} %{MONTH} %{YEAR} %{TIME},%{NUMBER:duration} %{WORD:loglevel} %{WORD:Activity} [{%{DATA:foo1}}]:(.*) execution time: %{NUMBER:executionTime} ms"}
}
kv {
source => "foo1"
field_split => ","
prefix => "foo_"
}
}
output {
stdout { codec => rubydebug}

Here is my JSON which i am passing on STDIN

{message:01 Aug 2017 17:58:19,048 INFO ProfileAspect [{applicationSystemCode=appname, clientIP=10.x.x.x, clusterId=Cluster-Id-NA, containerId=Container-Id-NA, correlationId=536bacc1-1b50-3866-5c8c-8d0efa037f8f, domainName=defaultDomain, hostName=ip-x-x-x.domain.com, messageId=10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd, userId=ANONYMOUS, webAnalyticsCorrelationId=66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0}]: class com.provider.base.v1.HomeBaseApiConsumer.searchTasks execution time: 15 ms"}


(Brandon Hatch) #12

Looks like you missed a double quote or two in the message you passed in.

Try this format.

{"message":"01 Aug 2017 17:58:19,048 INFO ProfileAspect [{applicationSystemCode=appname, clientIP=10.x.x.x, clusterId=Cluster-Id-NA, containerId=Container-Id-NA, correlationId=536bacc1-1b50-3866-5c8c-8d0efa037f8f, domainName=defaultDomain, hostName=ip-x-x-x.domain.com, messageId=10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd, userId=ANONYMOUS, webAnalyticsCorrelationId=66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0}]: class com.provider.base.v1.HomeBaseApiConsumer.searchTasks execution time: 15 ms"}


(Mulligan) #13

Thanks Brandon, It worked :slight_smile:


(Mulligan) #14

But Kibana actually doesnt show me the complete log,:

this is what it is showing and cutting the last part i.e execution time :15 ms. Although it shows that in the fields.


(Mulligan) #16

Hi Brandon, since this one worked, is it possible to have a nested KV filter, because after the first KV filter there is a need of another KV filter and be splitted in the same log line, below is the example log:

{“message”:“01 Aug 2017 17:58:19,048 INFO ProfileAspect [{applicationSystemCode=appname, clientIP=10.x.x.x, clusterId=Cluster-Id-NA, containerId=Container-Id-NA, correlationId=536bacc1-1b50-3866-5c8c-8d0efa037f8f, domainName=defaultDomain, hostName=ip-x-x-x.domain.com, messageId=10.x.x.23-e2250a0e-b706-4e95-8e11-5b9bf310eabd, userId=ANONYMOUS, webAnalyticsCorrelationId=66D276FF1489DFF845056FD915664268|F90B27374FD5E26D2566CEE3AFDA3AB0}]: class com.provider.base.v1.HomeBaseApiConsumer.searchTasks execution time: 15 ms”}: KpiMetric=“NONE” TransactionName=“Notes” TransactionStatus=“Success” User=“Associate(firstName=Fuller, lastName=Martin, role=Consultant, email=fuller.martin@domain.com, electronicId=FMA422)”

As you notice there fields after the first KV filter are also in a Key value fashion, i.e in the last three lines above.

Another way of doing this is using the simple %{WORD:field} format, but i was keen to use the KV itself if it is possible at all.


(Mulligan) #17

Would the KV filter not work in case of File input option?

my logstash config is like below:

input {
file {
path => "/prod/onic_app*.log"
start_position => beginning
source => onic_tomcat_app
}
file {
path => "/prod/onic__perf*.log"
start_position => beginning
source => onic_tomcat_perf
}
file {
path => "/prod/onic__sys*.log"
start_position => beginning
source => onic_tomcat_sys
}
beats {
port => "5044"
}
}

filter {
grok {
match => {"message" => "%{MONTHDAY} %{MONTH} %{YEAR} %{TIME},%{NUMBER:duration} %{WORD:loglevel} %{WORD:Activity} [{%{DATA:foo1}}]:(.*) execution time: %{NUMBER:executionTime} ms"}
}
kv {
source => "foo1"
field_split => ", "
}
}
output {
elasticsearch {
hosts => "localhost:9200"
#manage_template => false
index => onic
user => elastic
password => elasticpassword
}
#stdout { codec => rubydebug}
}

And it gives me this error

ERROR logstash.inputs.file - Unknown setting 'source' for file


(Brandon Hatch) #18

"ERROR logstash.inputs.file - Unknown setting ‘source’ for file" means that the file input isn't working for you.

source => onic_tomcat_app

That line right there is your problem. Source isn't a valid configuration option. https://www.elastic.co/guide/en/logstash/master/plugins-inputs-file.html

I haven't tried it, but the KV filter has a recursive option. Just set recursive => true.

A boolean specifying whether to drill down into values and recursively get more key-value pairs from it. The extra key-value pairs will be stored as subkeys of the root key.

Default is not to recursive values.


(system) #19

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.