Need support in parsing Syslog messages

I want to parse the below syslog message.

Jul 23 11:39:11 cheerwine CEF: 0|OWASP|appsensor|1.0|IE1|Input
Validation|7|cat=attack_detection deviceExternalId=localhostme suser=bob
cn1Label=thresholdCount cn1=3 cn2Label=intervalDuration cn2=5
cs1Label=intervalUnit cs1=minutes

I want to extract as below
"Detection_point_Category"= "Input Validation", "Detection_point_label"="IE1" and
values of "cat", "deviceExternalID"and "suser".

I know we need to use KV filter for parsing. I put lots of effort on making it work but I was unsuccessful.Looking forward for the help from Elastic community.

The code I am currently working on is below:

input {
tcp {
port => 5000
type => syslog
}
}

filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGBASE}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}

syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}

}
}

output {
elasticsearch { host => localhost
index => "dashboard"
}
stdout { codec => rubydebug }
}

Start with a grok filter to separate the pieces that are suitable for parsing with the kv filter, then apply kv to the rest. Something like this (untested) should get you started:

filter {
  grok {
    match => [
      "message",
      "... 0\|OWASP\|appsensor\|1\.0\|%{WORD:Detection_point_label}\|%{WORDDetection_point_Category}\|7\|%{GREEDYDATA:keyvalues}"
    ]
  }
  kv {
    source => "keyvalues"
    remove_field => ["keyvalues"]
  }
}

I've obviously omitted the beginning of the string in the grok expression (you should be able to use %{SYSLOGBASE} there) and you'll want to replace the hardcoded field values ("appsensor", "7", ...) with something that matches the kind of data you might get there. Or, you could make it match any character that isn't a pipe character with [^|]+.

You could also use a csv filter to split the pipe-separated pieces. That might actually be more elegant.

I have tried the below two types of codes but the both resulted in grok filter failure.

Code-1:
match => [
"message",
"... 0|OWASP|appsensor|1.0|%{WORD:Detection_point_label}|%{WORDDetection_point_Category}|7|%{GREEDYDATA:keyvalues}"
]

Code-2:
match => [
"message",
"%{SYSLOGBASE}|OWASP|appsensor|1.0|%{WORD:Detection_point_label}|%{WORDDetection_point_Category}|7|%{GREEDYDATA:keyvalues}"
]

Well, the first one obviously failed since my example was pseudo code. In the second attempt you've introduced SYSLOGBASE, which is defined like this:

SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:

Assuming that this matches your log (I think it does but haven't looked closely) then at the very least you're missing a space and the initial zero after SYSLOGBASE. This should be closer to working:

%{SYSLOGBASE} 0\|OWASP\|...

The below is the complete code I was using to extract the required data from the given syslog. I had made the necessary changes suggested by you but still it resulting in grok filter failure

Syslog:
Jul 23 11:39:11 cheerwine CEF: 0|OWASP|appsensor|1.0|IE1|Input
Validation|7|cat=attack_detection deviceExternalId=localhostme suser=bob
cn1Label=thresholdCount cn1=3 cn2Label=intervalDuration cn2=5
cs1Label=intervalUnit cs1=minutes

Updated logstash config file:

input {
tcp {
port => 5000
type => syslog
}
}

filter {
if [type] == "syslog" {
grok {
match => [
"message",
"%{SYSLOGBASE} 0|OWASP|appsensor|1.0|%{WORD:Detection_point_label}|%{WORDDetection_point_Category}|7|%{GREEDYDATA:keyvalues}"
]
}
kv {
source => "keyvalues"
remove_field => ["keyvalues"]
}

}
}

output {
elasticsearch { host => localhost
index => "dashboard"
}
stdout { codec => rubydebug }
}

... WORDDetection_point_Category ...

This should be WORD:Detection_point_Category. That was a typo from what I wrote last night, sorry.

I corrected and executed it long back... But still the error persists

Replace %{WORD:Detection_point_Category} with (?<Detection_point_Category>[^|]+).

Output after making suggested changes:

Syslog used:
Jul 23 11:39:11 cheerwine CEF: 0|OWASP|appsensor|1.0|IE1|Input
Validation|7|cat=attack_detection deviceExternalId=localhostme suser=bob
cn1Label=thresholdCount cn1=3 cn2Label=intervalDuration cn2=5
cs1Label=intervalUnit cs1=minutes

Output generated:
{
"message" => "Jul 23 11:39:11 cheerwine CEF: 0|OWASP|appsensor|1.0|IE1|Input\r",
"@version" => "1",
"@timestamp" => "2015-08-11T13:10:11.723Z",
"host" => "127.0.0.1",
"type" => "syslog",
"tags" => [
[0] "_grokparsefailure"
]
}
{
"message" => "Validation|7|cat=attack_detection deviceExternalId=localhostme suser=bob\r",
"@version" => "1",
"@timestamp" => "2015-08-11T13:10:11.726Z",
"host" => "127.0.0.1",
"type" => "syslog",
"tags" => [
[0] "_grokparsefailure"
]
}
{
"message" => "cn1Label=thresholdCount cn1=3 cn2Label=intervalDuration cn2=5\r",
"@version" => "1",
"@timestamp" => "2015-08-11T13:10:11.726Z",
"host" => "127.0.0.1",
"type" => "syslog",
"tags" => [
[0] "_grokparsefailure"
]
}

Filter block code:
filter {
grok {
match => [ "message","%{SYSLOGBASE} 0|OWASP|appsensor|1.0|%{WORD:Detection_point_label}|(?<Detection_point_Category>[^|]+)|7|%{GREEDYDATA:keyvalues}"]
}

kv {
source => "keyvalues"
remove_field => ["keyvalues"]
}

What, do your syslog messages span multiple lines? Are you still using a TCP input?

I got it Magnus, Thanks for ur support. It means a lot to me :smile: