Parse logs


(Silviupanaite) #1

I'm trying to parse:
{"log":"86.121.176.201 - - [09/Sep/2015:14:14:06 +0000] "GET /css/style.less HTTP/1.1" 304 0 "http://developer.unificationengine.com/login" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36" "-"\r\n","stream":"stdout","time":"2015-09-09T14:14:06.112442077Z"}

if [type] == "docker-logs" {
grok {
match => ["message", "%{QS}:%{QS:message_log},%{QS}:%{QS:log_type},%{QS}:%{QS:time}"]
match => ["message", "\A{"log":"%{IP} - - [%{HTTPDATE:time}] \"%{WORD} %{PATH} %{PROG}%{NOTSPACE}%{SPACE}%{NUMBER} 0 \"%{URI}%{NOTSPACE}%{SPACE}%{GREEDYDATA}"]
}
mutate {
rename => ["log", "message"]
}
date {
match => ["time", "ISO8601"]
remove_field => ["time"]
}
}

But I'm not able to, the message output is still the same. Can you help me out, I don't know what I'm doing wrong?


(Magnus Bäck) #2

Don't use grok to parse JSON. If possible use the json codec for the input. Failing that, use the json filter. Once that's in place, see if you can use the COMBINEDAPACHELOG pattern instead of writing your own pattern from scratch.


(Silviupanaite) #3

Tried what you suggested... nothing changed. Tried with input codec and also filter.
Here is the json pulled from kibana4
{
"_index": "logstash-2015.09.15",
"_type": "docker-logs",
"_id": "AU_QgbrYMc0WVE2CwNCU",
"_score": null,
"_source": {
"message": "{"log":"sendmail\r\n","stream":"stdout","time":"2015-09-15T10:16:38.353745638Z"}",
"@version": "1",
"@timestamp": "2015-09-15T10:16:46.144Z",
"type": "docker-logs",
"file": "/var/lib/docker/containers/bfc1026c1e00a25cd145d449f031e00c26abb120e8160c20a12198e0822ec365/bfc1026c1e00a25cd145d449f031e00c26abb120e8160c20a12198e0822ec365-json.log",
"host": "bs1.uib.co",
"offset": "10824",
"tags": "unifiedinbox/ue-developer",
"serivce": "map[io.rancher.project.name:unificationengine-developer-portal io.rancher.project_service.name:unificationengine-developer-portal/ue-developer io.rancher.scheduler.affinity:container_label_soft:io.rancher.service.deployment.unit=12495afc-716d-41ec-9152-d652c390d09e io.rancher.service.deployment.unit:12495afc-716d-41ec-9152-d652c390d09e io.rancher.stack.name:unificationengine-developer-portal io.rancher.container.ip:10.42.137.137/16 io.rancher.container.uuid:b7fc233f-cabf-4a97-8102-005ded7feb93 io.rancher.service.launch.config:io.rancher.service.primary.launch.config io.rancher.stack_service.name:unificationengine-developer-portal/ue-developer]"
},
"fields": {
"@timestamp": [
1442312206144
]
},
"sort": [
1442312206144
]
}

and here's the filter used:
filter {
if "unifiedinbox/ue-developer" in [tags] {
json {
source => "message"
date {
match => ["time", "ISO8601"]
remove_field => ["time"]
}
}
}


(Magnus Bäck) #4

Try isolating things. Skip the if "unifiedinbox/ue-developer" in [tags] conditional, for example. I can't reproduce what you're seeing with either of these minimal examples:

$ cat data
{"log":"sendmail\r\n","stream":"stdout","time":"2015-09-15T10:16:38.353745638Z"}
$ cat json-codec.config 
input { stdin { codec => json } }
output { stdout { codec => rubydebug } }
$ /opt/logstash/bin/logstash -f json-codec.config < data 
Logstash startup completed
{
           "log" => "sendmail\r\n",
        "stream" => "stdout",
          "time" => "2015-09-15T10:16:38.353745638Z",
      "@version" => "1",
    "@timestamp" => "2015-09-15T11:20:13.965Z",
          "host" => "lnxmagnusbk"
}
Logstash shutdown completed
$ cat json-filter.config 
input { stdin { } }
output { stdout { codec => rubydebug } }
filter {
  json {
    source => "message"
  }
}
$ /opt/logstash/bin/logstash -f json-filter.config < data
Logstash startup completed
{
       "message" => "{\"log\":\"sendmail\\r\\n\",\"stream\":\"stdout\",\"time\":\"2015-09-15T10:16:38.353745638Z\"}",
      "@version" => "1",
    "@timestamp" => "2015-09-15T11:20:39.075Z",
          "host" => "lnxmagnusbk",
           "log" => "sendmail\r\n",
        "stream" => "stdout",
          "time" => "2015-09-15T10:16:38.353745638Z"
}
Logstash shutdown completed

(Silviupanaite) #5

The full description:
I use docker-gen to generate json files that I send using logstash forwarder to logstash server. I wasn't able to add the new field, with or without the if condition.

The docker gen template is:
{{/* this is a docker-gen template. See docker-gen for more details */}}
{

The network section covers network configuration :slight_smile:

"network": {
# A list of downstream servers listening for our messages.
# logstash-forwarder will pick one at random and only switch if
# the selected one appears to be dead or unresponsive
"servers": [ "ec2-52-16-45-21.eu-west-1.compute.amazonaws.com:5000" ],

# The path to your trusted ssl CA file. This is used
# to authenticate your downstream server.
"ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt",

# Network timeout in seconds. This is most important for
# logstash-forwarder determining whether to stop waiting for an
# acknowledgement from the downstream server. If an timeout is reached,
# logstash-forwarder will assume the connection or server is bad and
# will connect to a server chosen at random from the servers list.
"timeout": 15

},

The list of files configurations

"files": [
# An array of hashes. Each hash tells what paths to watch and
# what fields to annotate on events from those paths.
{{ range $key, $value := . }}
{
"paths": [
"/var/lib/docker/containers/{{ $value.ID }}/{{ $value.ID }}-json.log"
],
"fields": { "type": "docker-logs", "tags":"{{ $value.Image.Repository }}", "serivce": "{{ $value.Labels }}" }
},
{{ end }}
{
"paths": [
"/var/log/syslog"
],
"fields": { "type": "syslog" }
}
]
}

logstash filter:
filter{
if [type] == "docker-logs" {
json {
source => "message"
grok {
match => [ "message", "%{COMBINEDAPACHELOG}" ]
}
add_field => {
"new_field" => "new_static_value"
}
}
}
}


(Magnus Bäck) #6

Your Logstash configuration is broken and I'm not sure what it's supposed to mean. You're starting the grok filter before closing the json filter, and the add_field parameter doesn't belong to any filter at all. Is this what you mean?

filter{
  if [type] == "docker-logs" {
    json {
      source => "message"
    }
    grok {
      match => [ "message", "%{COMBINEDAPACHELOG}" ]
      add_field => {
        "new_field" => "new_static_value"
      }
    } 
  }
}

(Silviupanaite) #7

Yes, the config you gave back is the one that I'm using.
Looking into the stdout there is no grok parse error or any indication that the filter works.

I don't know how to go about this.


(Magnus Bäck) #8

Yes, the config you gave back is the one that I'm using.

So... you're manually typing in your configuration instead of copying & pasting it? Because the configuration you posted earlier wouldn't be accepted by Logstash.

Looking into the stdout there is no grok parse error or any indication that the filter works.

What do you get to stdout? Please use the rubydebug codec to make things more readable. Either comment out all filters or copy/paste them.


(Silviupanaite) #9

I was cleaning up my config as there were some clutter.
This is the stdout:
{
"message" => "{"log":"2015-09-17 08:55:02,975 INFO spawned: 'cassandra' with pid 5954\r\n","stream":"stdout","time":"2015-09-17T08:55:02.975891882Z"}",
"@version" => "1",
"@timestamp" => "2015-09-17T08:55:08.122Z",
"type" => "docker-logs",
"file" => "/var/lib/docker/containers/9c0c7cf5f0cfb88dcd84144db6f216c95e74f7a24be0c560848ad9ff8e4f8fe4/9c0c7cf5f0cfb88dcd84144db6f216c95e74f7a24be0c560848ad9ff8e4f8fe4-json.log",
"host" => "xxx",
"offset" => "3666273",
"tags" => "xxx",
"serivce" => "xxxx"
}


(Magnus Bäck) #10

And this is without any filter? What if you enable the json and grok filters posted above?


(Silviupanaite) #11

Well they work...
It turn out I was editing another file, not actual config (did update logstash several times).

I feel like an :tomato:.

Thank you Magnus for helping me out.


(Chandra) #12

Hi ,
I saw the kibana parsed Json response of yours .
{
"_index": "logstash-2015.09.15",
"_type": "docker-logs",
"id": "AUQgbrYMc0WVE2CwNCU",
"_score": null,
"_source": {
"message": "{"log":"sendmail\r\n","stream":"stdout","time":"2015-09-15T10:16:38.353745638Z"}",
"@version": "1",
"@timestamp": "2015-09-15T10:16:46.144Z",
"type": "docker-logs",
"file": "/var/lib/docker/containers/bfc1026c1e00a25cd145d449f031e00c26abb120e8160c20a12198e0822ec365/bfc1026c1e00a25cd145d449f031e00c26abb120e8160c20a12198e0822ec365-json.log",
"host": "bs1.uib.co",
"offset": "10824",
"tags": "unifiedinbox/ue-developer",
"serivce": "map[io.rancher.project.name:unificationengine-developer-portal io.rancher.project_service.name:unificationengine-developer-portal/ue-developer io.rancher.scheduler.affinity:container_label_soft:io.rancher.service.deployment.unit=12495afc-716d-41ec-9152-d652c390d09e io.rancher.service.deployment.unit:12495afc-716d-41ec-9152-d652c390d09e io.rancher.stack.name:unificationengine-developer-portal io.rancher.container.ip:10.42.137.137/16 io.rancher.container.uuid:b7fc233f-cabf-4a97-8102-005ded7feb93 io.rancher.service.launch.config:io.rancher.service.primary.launch.config io.rancher.stack_service.name:unificationengine-developer-portal/ue-developer]"
},
"fields": {
"@timestamp": [
1442312206144
]
},
"sort": [
1442312206144
]
}

Questions is how did you get the rancher metadata in your reponse .What config changes you had to make to get the rancher metadata like
"serivce": "map[io.rancher.project.name:unificationengine-developer-portal io.rancher.project_service.name:unificationengine-developer-portal/ue-developer io.rancher.scheduler.affinity:container_label_soft:io.rancher.service.deployment.unit=12495afc-716d-41ec-9152-d652c390d09e io.rancher.service.deployment.unit:12495afc-716d-41ec-9152-d652c390d09e io.rancher.stack.name:unificationengine-developer-portal io.rancher.container.ip:10.42.137.137/16 io.rancher.container.uuid:b7fc233f-cabf-4a97-8102-005ded7feb93 io.rancher.service.launch.config:io.rancher.service.primary.launch.config io.rancher.stack_service.name:unificationengine-developer-portal/ue-developer]"


(system) #13