Parsing datapower syslog using GROK for latency calculation for given URI

Hi,

I'm new to GROK and Logstash.

I'm trying to calculate the response time from latency field of a syslog entry of Datapower.
Entry looks like following:

20180308T211418.088Z [0x80e00073][mpgw][debug] mpgw(MyServices_MPGW): tid(12380973)[102.29.16.10] gtid(5ae25f9f5a986d770257aec3): Latency: 0 138 0 56 56 56 0 138 138 138 138 138 138 138 56 138 [http://10.40.59.102:8070/rephierarchy/savedlist/2798]

each string of Above entry is represented as following:
20180301T211418.088Z - Timestamp
{0x80e00073] - errorcode
[mpgw]- objectype
[debug] - loglevel
mpgw(MyServices_MPGW): - servicename
tid(12380973)[102.29.16.10] - transactionid
gtid(5ae25f9f5a986d770257aec3): - gid
Latency: 0 138 0 56 56 56 0 138 138 138 138 138 138 138 56 138 --- latency
http://10.40.59.102:8070/rephierarchy/savedlist/2798] -- URL

I'm trying to write GROK filter in logstash conf file to decode above entry and get the timestamp, latency and URI. I'll calculate response time from latency (response time =subtracting 12 position field from 6th position field)

my logstash conf entry looks like

input {
tcp {
port => 5514
type => syslog
}
}

filter {
syslog_pri {
add_field => { "[@metadata][type]" => "syslog" }
add_field => { "[@metadata][beat]" => "syslog" }
}
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGLINE}" }
}
date {
match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}

output {
if [type] == "syslog" {
elasticsearch {
hosts => ["localhost:9200"]
sniffing => true
manage_template => false
user => elastic
password => defaultpassword
index => "datapower-%{+YYYY.MM.dd}"
}
}
}

A sample JSON Entry in Kibana for this index:

{
"_index": "datapower-2018.03.08",
"_type": "syslog",
"_id": "AWIGW4Ud24FGqAt6qLue",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2018-03-08T16:04:43.926Z",
"syslog_severity_code": 5,
"port": 33670,
"syslog_facility": "user-level",
"@version": "1",
"host": "10.40.137.99",
"syslog_facility_code": 1,
"message": "<14>Mar 08 11:03:11 DVI_MyServices [0x80e00073][latency][info] mpgw(MyServices_MPGW): trans(12362109)[102.29.16.10] gtid(5ae25f9f5aa15f0704322723): Latency: 0 138 0 56 56 56 0 138 138 138 138 138 138 138 56 138
[http://10.40.59.102:8070/rephierarchy/savedlist/2798]",
"type": "syslog",
"syslog_severity": "notice",
"tags": [
"_grokparsefailure",
"_jsonparsefailure"
]
},
"fields": {
"@timestamp": [
1520525083926
]
},
"sort": [
1520525083926
]
}

I tried to use Grok debugger but I'm stuck with timestamp entry itself. I tried few options like
%{DATESTAMP:timestamp}
%{SYSLOGTIMESTAMP:timestamp}
%{TIMESTAMP_ISO8601:timestamp}

Personally I would use dissect rather than grok.

  dissect {
    mapping => {
      message => "<%{level}>%{ts} %{+ts} %{+ts} %{unnamed} [%{errorcode}][%{loglevel}] %{servicename}: %{transactiondid} %{gid} Latency: %{latency} [%{url}]"
    }
  }
  ruby {
    # My mind is going blank on how to do this with non-ruby filter
    code => 'event.set("latencystuff", event.get("latency").split())'
  }
  mutate { convert => { "latencystuff" => "integer" } }
  date { match => [ "ts", "MMM dd HH:mm:ss" ] target => "@timestamp" }

Is it possible to ignore the fields other than timestamp, latency and URL? I'll try your suggestion.

Yes, if you do not supply a name then dissect will not store the result.

message => "<%{}>%{ts} %{+ts} %{+ts} %{} [%{}][%{}] %{}: %{} %{} Latency: %{latency} [%{url}]"

What does the <%{level}> and %{unnamed} stand for from your previous reply?

<%{level}>%{ts} %{+ts} %{+ts} %{unnamed} [%{errorcode}][%{loglevel}] %{servicename}: %{transactiondid} %{gid} Latency: %{latency} [%{url}]"

Placeholders to be replaced with better field names if you wanted to keep those fields :slight_smile:

I was wonder how the <%{level}> and %{unnamed} related to the below log event
20180308T211418.088Z [0x80e00073][mpgw][debug] mpgw(MyServices_MPGW): tid(12380973)[102.29.16.10] gtid(5ae25f9f5a986d770257aec3): Latency: 0 138 0 56 56 56 0 138 138 138 138 138 138 138 56 138 [http://10.40.59.102:8070/rephierarchy/savedlist/2798]

As log as the filter works I'm good. I made the changes you have recommend and I would not be able to test them until I restart ELK Stack. I'm not sure if the is expected behavior of the ELK stask to restart all (Logstash, Elasticsearch and Kibana). I have dependence with other team who is testing other functionality. I'll reboot once the other team have done testing their changes. After installing XPack, I have noticed that rebooting ELK stack doesnt reflect logstash config changes.

"message": "<14>Mar 08 11:03:11 DVI_MyServices [0x80e00073][latency][info] mpgw(MyServices_MPGW): trans(12362109)[102.29.16.10] gtid(5ae25f9f5aa15f0704322723): Latency: 0 138 0 56 56 56 0 138 138 138 138 138 138 138 56 138 [http://10.40.59.102:8070/rephierarchy/savedlist/2798]"

In that message, level would pick up the 14 at the beginning, and unnamed would have been set to DVI_MyServices.

If you restart logstash it will pick up the new configuration, no need to restart elasticsearch or kibana.

Now I understood what you mean by <%{level}> and %{unnamed}.

Question: You still want me to GROK along dissect? or Can we skip GROK in this scenario?
Below output I got is using GROK:
--- GROK OUTPUT--
"<14>Mar 08 11:03:11 DVI_MyServices [0x80e00073][latency][info] mpgw(MyServices_MPGW): trans(12362109)[102.29.16.10] gtid(5ae25f9f5aa15f0704322723): Latency: 0 138 0 56 56 56 0 138 138 138 138 138 138 138 56 138 [http://10.40.59.102:8070/rephierarchy/savedlist/2798]"

Without using GROK, the input from TCP to Logstash would look like the following
---- TCP input ----
20180308T211418.088Z [0x80e00073][mpgw][debug] mpgw(MyServices_MPGW): tid(12380973)[102.29.16.10] gtid(5ae25f9f5a986d770257aec3): Latency: 0 138 0 56 56 56 0 138 138 138 138 138 138 138 56 138 [http://10.40.59.102:8070/rephierarchy/savedlist/2798]

Based our your latest response, may be you wanted me to use filter config as following:

filter {
syslog_pri {
add_field => { "[@metadata][type]" => "syslog" }
add_field => { "[@metadata][beat]" => "syslog" }
}
if [type] == "syslog" {

    grok {
        match => { "message" => "%{SYSLOGLINE}" }
    }
    date {
        match => [ "timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
    dissect {
            mapping => {
                    message => "<%{}>%{ts} %{+ts} %{+ts} %{} [%{}][%{}] %{}: %{} %{} Latency: %{latency} [%{url}]"
            }
    }
    ruby {
            code => 'event.set("latencystuff", event.get("latency").split())'
    }
    mutate { convert => { "latencystuff" => "integer" } }
    date { match => [ "ts", "MMM dd HH:mm:ss" ] target => "@timestamp" }
}

}

I'll test above changes once a request comes from TCP (I cannot manually trigger the TCP request).

Also is it possible to get URI from URL request without query parameters using dissect ?
For example
dissect {
mapping => {
message => "<%{}>%{ts} %{+ts} %{+ts} %{} [%{}][%{}] %{}: %{} %{} Latency: %{latency} [%{url}]"
"url" => "http://%{}:%{}/%{uri}?%{}"
}
}

Yes, but the pattern you gave only works if there is a query string. So you would have to test for the presence of ? to decide whether to include ?%{} in the mapping.

I made some progress and Removed GROK. I'm getting URL with "]" appended to the end. Below is my latest logstash config for message:
"message": "<14>Mar 08 11:03:11 DVI_MyServices [0x80e00073][latency][info] mpgw(MyServices_MPGW): trans(12362109)[102.29.16.10] gtid(5ae25f9f5aa15f0704322723): Latency: 0 186 0 65 65 65 0 186 187 186 187 187 186 186 65 186 [http://10.40.59.102:8070/appview/details?appID=125635]"

------- logstash config -----
input {
tcp {
port => 5514
type => syslog
}
}

filter {
syslog_pri {
add_field => { "[@metadata][type]" => "syslog" }
add_field => { "[@metadata][beat]" => "syslog" }
}
if [type] == "syslog" {
dissect {
mapping => {
message => "<%{}>%{ts} %{+ts} %{+ts} %{} [%{}][%{}][%{loglevel}] %{}: %{} %{} Latency: %{latency} [%{url}]"
"url" => "%{}://%{}:%{}/%{uri}"
"latency" => " %{?lf1->} %{?lf2->} %{?lf3->} %{?lf4->} %{?lf5->} %{lfConnectionAttempted->} %{?lf7->} %{?lf8->} %{?lf9->} %{?lf10->} %{?lf11->} %{lfparsingcomplete->} %{?lf13->} %{?lf14->} %{?lf15->} %{?lf16}"
}
convert_datatype =>
{
lfConnectionAttempted => "int"
lfparsingcomplete => "int"
}

    }
    ruby {
            code => 'event.set("latencystuff", event.get("latency").split())'
    }
    mutate { convert => { "latencystuff" => "integer" } }
    date { match => [ "ts", "MMM dd yyyy HH:mm:ss" ] target => "@timestamp" }
}

}

output {
if [type] == "syslog" {
elasticsearch {
hosts => ["localhost:9200"]
sniffing => true
manage_template => false
user => elastic
password => defaultpassword
index => "test11-%{+YYYY.MM.dd}"
}
}
}

Output in Kibana:

"uri": "accountview/details?accountID=125635]",
"url": "http://10.40.59.102:8070/appview/details?appID=125635]",
"latencystuff": [
0,
186,
0,
65,
65,
65,
0,
186,
187,
186,
187,
187,
186,
186,
65,
186
],
"loglevel": "info",

How do I get rid of "]" from URL? Do I need to regex to get rid of "]" and to parse URI when there are query parameters?

Add a trailing %{} to the dissect mapping as shown here

that worked. Now I need to figure out how out how to parse URL to get URI without query parameters. Thank you Badger.

I haven't tested this, but one approach would be to do two dissects with each of

"url" => "http://%{}:%{}/%{uri1}?%{}"
"url" => "http://%{}:%{}/%{uri2}"

Then pick one or the other based on whether [uri1] == [uri2]

After Adding the query parameter URL line along with URL line without query parameter in logstash config, logstash is not even listening on port 5514 after restarting. Removing query parameter URL from config is working as expected.
Is there a way to check in some log files where the problem is?

"logstash-plain.log" file size is empty.

/var/log/logstash/logstash-plain.log
-rw-r--r--. 1 logstash logstash 0 Mar 13 09:36 logstash-plain.log

Did you add both patterns to a single dissect? That results in an incomprehensible error message :slight_smile: Try this

  dissect { mapping => { "url" => "%{}://%{}:%{}/%{uri1}?%{}" } }
  dissect { mapping => { "url" => "%{}://%{}:%{}/%{uri2}" } }
  if [uri1] != "" {
    mutate { add_field => { "uri" => "%{uri1}" } }
  } else {
    mutate { add_field => { "uri" => "%{uri2}" } }
  }

Thank you Sir. It worked.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.