Logstash grok filter for apache customized logs

Hi guys,

I'm trying to use the elk to generate dashboards of apache access logs.
However I can't split up in various fields the message field.
It turns out that my apache access logs are a little customized because I use the AJP Protocol and need some additional information in the log.
The APACHECOMBINED filter does not work for me because the log is customized.

I'm generating the logs in apache that way:
LogFormat "%v %h \"%{BALANCER_WORKER_NAME}e\" %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %I %O" vhost_ajp_worker_name

the log appears in this way:
vhost.domain.com 172.28.146.75 "ajp://internalserver.domain.local" - - [11/Jul/2017:23:03:58 -0300] "POST /1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam HTTP/1.1" 200 722 "https://internalserver.domain.local/1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam?id=978617&ca=ba8f7f19f16ac79696cb5ba871212278c97529e33e0d5e29ec93d9c2b7eedafa2bab2c8b14ba63bc22bae4dd465f99927d7b339e4eaafcf4&idTaskInstance=290637995" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0" 87720 1588 1481

I tried to change anyway the filter but can't make the fields that are in "message" in fields outside:

example of grok trying:

"message" => '%{WORD:VirtualHost} %{IPORHOST:clientip} %{QS:BALANCER_WORKER_NAME} %{WORD:remote_log_name} %{WORD:user} \[%{HTTPDATE:timestamp}\] \"%{WORD:Method} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} %{NUMBER:Response_size:int} %{QS:referrer} %{QS:agent} %{NUMBER:Time_taken:int} %{NUMBER:bytes_received:int} %{NUMBER:bytes_sents:int}'

My file apache.conf:

input { 
  stdin { }
}

filter {
  if [type] == "apache" {
    grok {
      match => { "message" => '%{WORD:VirtualHost} %{IPORHOST:clientip} \"%{WORD:balancer_worker}\" %{WORD:remote_log_name} %{WORD:user} \[%{HTTPDAT
E:timestamp}\] \"%{WORD:Method} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response} %{NUMBER:Response_size} \"%{WORD:referrer}\" \"%{WOR
D:agent}\" %{NUMBER:Time_taken} %{NUMBER:bytes_received} %{NUMBER:bytes_sents}'
      }
    }

    date {
      match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
    }

    geoip {
      source => "clientip"
    }
  }
}
output {
  elasticsearch { hosts => ["localhost:9200"]}
  stdout { codec => rubydebug}
}

The logs being displayed in kibana without the message field divided into several:

example of grok trying:
"message" => '%{WORD:VirtualHost} %{IPORHOST:clientip} %{QS:BALANCER_WORKER_NAME} %{WORD:remote_log_name} %{WORD:user} [%{HTTPDATE:timestamp}] "%{WORD:Method} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} %{NUMBER:Response_size:int} %{QS:referrer} %{QS:agent} %{NUMBER:Time_taken:int} %{NUMBER:bytes_received:int} %{NUMBER:bytes_sents:int}'

I think the use of WORD to capture remote_log_name and user isn't working because WORD doesn't match "-".

As always, start simple (with ^%{WORD:VirtualHost}) and verify that it works. Continue by adding the next token. Does it still work? Always build grok expressions gradually.

Magnus,

Thanks for the answer.

To separate fields need add_field?

 grok {
      add_field => {"VirtualHost" => "%{WORD:VirtualHost}" }
      break_on_match => false
      match => { "message" => '^%{WORD:VirtualHost}
}

Or just the match

grok {
     match => { "message" => '^%{WORD:VirtualHost}
}

Im trying various simple grok expressions with and without "^":

grok {
     match => { "message" => '^%{DATA:VirtualHost}
}

and:

grok {
     match => { "message" => '^%{HOSTNAME:VirtualHost}
}

Regardless of what I modify it always returns the fields together in message as example of JSON return:

{
  "_index": "logstash-2017.07.13",
  "_type": "log",
  "_id": "AV075WSyrEzNUGjyeTsC",
  "_score": null,
  "_source": {
    "@timestamp": "2017-07-13T12:21:14.439Z",
    "geoip": {},
    "offset": 162672610,
    "@version": "1",
    "input_type": "log",
    "beat": {
      "hostname": "internalserver.domain.local",
      "name": "internalserver.domain.local",
      "version": "5.1.1"
    },
    "host": "internalserver.domain.local",
    "source": "/var/log/httpd/sistema/sistema_ssl_access_2017.07.13.log",
    "message": "vhost.domain.com 172.28.1.68 \"ajp://internalserver.domain.local\" - - [13/Jul/2017:09:21:13 -0300] \"GET /1g/stylesheet/dropzone/pje-dropzone.css HTTP/1.1\" 304 - \"https://internalserver.domain.local/1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam?id=1122299&ca=a4c5254d003967da37a43305b852b8300042b923f8356dc9be9f97eb18c5997de36ab582dde3aa125f959d471a3d1136a691b7cdbf23542439b484d172d84d8e&tab=form&idTaskInstance=303393732\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0\" 3994 838 196",
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied",
      "_grokparsefailure",
      "_geoip_lookup_failure"
    ]
  },
  "fields": {
    "@timestamp": [
      1499948474439
    ]
  },
  "sort": [
    1499948474439
  ]
}

To match the virtualhost I suggest you use NOTSPACE, i.e. do this:

match => { "message" => '^%{NOTSPACE:VirtualHost}'

Magnus,

Still not working:

My apache conf:

input { stdin { }}

filter {
  if [type] == "apache" {
    grok {
      match => { "message" => '^%{NOTSPACE:VirtualHost}'}
    }

    date {
      match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
    }

  }
}
output {
  elasticsearch { hosts => ["localhost:9200"]}
  stdout { codec => rubydebug}
}

JSON:

{
  "_index": "filebeat-2017.07.13",
  "_type": "log",
  "_id": "AV08hzUprEzNUGjy1jwR",
  "_score": null,
  "_source": {
    "@timestamp": "2017-07-13T15:17:58.522Z",
    "geoip": {},
    "offset": 1001954625,
    "@version": "1",
    "input_type": "log",
    "beat": {
      "hostname": "internalproxy.domain.local",
      "name": "internalproxy.domain.local",
      "version": "5.1.1"
    },
    "host": "internalproxy.domain.local",
    "source": "/var/log/httpd/sistema/sistema_ssl_access_2017.07.13.log",
    "message": "vhost.domain.com 179.199.3.195 \"ajp://internalserver.domain.local\" - - [13/Jul/2017:12:17:57 -0300] \"GET /1g/a4j/g/3_3_3.Finalscripts/tiny_mce/themes/advanced/skins/richfaces/ui.xcss/DATB/eAF7XL3mcujyGdIAFzgEjQ__.seam HTTP/1.1\" 200 18515 \"https://vhost.domain.com/1g/Processo/ConsultaProcesso/Detalhe/listProcessoCompletoAdvogado.seam?id=1110375&ca=e1b13c5b2fc3529f8ce16c5496926e4443aecdf44a4e343200379a611f5329c5cf64dc3de7fa48c65f959d471a3d11365843eb8fa54ffce839b484d172d84d8e\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0\" 4817 1356 19168",
    "fields": {
      "apache": true
    },
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied",
      "_grokparsefailure",
      "_geoip_lookup_failure"
    ]
  },
  "fields": {
    "@timestamp": [
      1499959078522
    ]
  },
  "sort": [
    1499959078522
  ]
}

The complete grok pattern of your log is->

^%{NOTSPACE:VirtualHost} %{IPORHOST:clientip} %{QS:BALANCER_WORKER_NAME} - - \[%{HTTPDATE:ts}\] "%{WORD:Method} %{GREEDYDATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response} %{NUMBER:Response_size} %{QS:referrer} %{QS:agent} %{NUMBER:Time_taken} %{NUMBER:bytes_received} %{NUMBER:bytes_sents}

Output in https://grokdebug.herokuapp.com/

{
  "VirtualHost": [
    [
      "vhost.domain.com"
    ]
  ],
  "clientip": [
    [
      "172.28.146.75"
    ]
  ],
  "BALANCER_WORKER_NAME": [
    [
      ""ajp://internalserver.domain.local""
    ]
  ],
  "ts": [
    [
      "11/Jul/2017:23:03:58 -0300"
    ]
  ],
  "Method": [
    [
      "POST"
    ]
  ],
  "request": [
    [
      "/1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam"
    ]
  ],
  "httpversion": [
    [
      "1.1"
    ]
  ],
  "response": [
    [
      "200"
    ]
  ],
  "Response_size": [
    [
      "722"
    ]
  ],
  "referrer": [
    [
      ""https://internalserver.domain.local/1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam?id=978617&ca=ba8f7f19f16ac79696cb5ba871212278c97529e33e0d5e29ec93d9c2b7eedafa2bab2c8b14ba63bc22bae4dd465f99927d7b339e4eaafcf4&idTaskInstance=290637995""
    ]
  ],
  "agent": [
    [
      ""Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0""
    ]
  ],
  "Time_taken": [
    [
      "87720"
    ]
  ],
  "bytes_received": [
    [
      "1588"
    ]
  ],
  "bytes_sents": [
    [
      "1481"
    ]
  ]
}

Makra,

Tanks for you help.
I use de debugger page and get correct but in my logstash still same problem

Using - https://grokdebug.herokuapp.com/

INPUT:

vhost.domain.com 172.28.146.75 "ajp://internalserver.domain.local" - - [11/Jul/2017:23:03:58 -0300] "POST /1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam HTTP/1.1" 200 722 "https://internalserver.domain.local/1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam?id=978617&ca=ba8f7f19f16ac79696cb5ba871212278c97529e33e0d5e29ec93d9c2b7eedafa2bab2c8b14ba63bc22bae4dd465f99927d7b339e4eaafcf4&idTaskInstance=290637995" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0" 87720 1588 1481

Pattern:

^%{HOSTNAME:VirtualHost} %{IPV4:clientip} "%{NOTSPACE:balancer_worker_name}" %{NOTSPACE:remote_log_name} %{NOTSPACE:user} \[%{HTTPDATE:timestamp}\] "%{WORD:Method} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:Response_size} "%{NOTSPACE:referrer}" %{QUOTEDSTRING:agent} %{NUMBER:Time_taken} %{NUMBER:bytes_received} %{NUMBER:bytes_sents}

Result:

 {
  "VirtualHost": [
    [
      "vhost.domain.com"
    ]
  ],
  "clientip": [
    [
      "172.28.146.75"
    ]
  ],
  "balancer_worker_name": [
    [
      "ajp://internalserver.domain.local"
    ]
  ],
  "remote_log_name": [
    [
      "-"
    ]
  ],
  "user": [
    [
      "-"
    ]
  ],
  "timestamp": [
    [
      "11/Jul/2017:23:03:58 -0300"
    ]
  ],
  "MONTHDAY": [
    [
      "11"
    ]
  ],
  "MONTH": [
    [
      "Jul"
    ]
  ],
  "YEAR": [
    [
      "2017"
    ]
  ],
  "TIME": [
    [
      "23:03:58"
    ]
  ],
  "HOUR": [
    [
      "23"
    ]
  ],
  "MINUTE": [
    [
      "03"
    ]
  ],
  "SECOND": [
    [
      "58"
    ]
  ],
  "INT": [
    [
      "-0300"
    ]
  ],
  "Method": [
    [
      "POST"
    ]
  ],
  "request": [
    [
      "/1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam"
    ]
  ],
  "httpversion": [
    [
      "1.1"
    ]
  ],
  "BASE10NUM": [
    [
      "1.1",
      "200",
      "722",
      "87720",
      "1588",
      "1481"
    ]
  ],
  "response": [
    [
      "200"
    ]
  ],
  "Response_size": [
    [
      "722"
    ]
  ],
  "referrer": [
    [
      "https://internalserver.domain.local/1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam?id=978617&ca=ba8f7f19f16ac79696cb5ba871212278c97529e33e0d5e29ec93d9c2b7eedafa2bab2c8b14ba63bc22bae4dd465f99927d7b339e4eaafcf4&idTaskInstance=290637995"
    ]
  ],
  "agent": [
    [
      ""Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0""
    ]
  ],
  "Time_taken": [
    [
      "87720"
    ]
  ],
  "bytes_received": [
    [
      "1588"
    ]
  ],
  "bytes_sents": [
    [
      "1481"
    ]
  ]
}

As I did not succeed with the full grok instruction, I followed Magnus's recommendation to start with a few fields, using https://grokdebug.herokuapp.com/

INPUT:

vhost.domain.com 172.28.146.75 "ajp://internalserver.domain.local" - - [11/Jul/2017:23:03:58 -0300] "POST /1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam HTTP/1.1" 200 722 "https://internalserver.domain.local/1g/Processo/ConsultaProcesso/Detalhe/detalheProcessoVisualizacao.seam?id=978617&ca=ba8f7f19f16ac79696cb5ba871212278c97529e33e0d5e29ec93d9c2b7eedafa2bab2c8b14ba63bc22bae4dd465f99927d7b339e4eaafcf4&idTaskInstance=290637995" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0" 87720 1588 1481

PATTERN:

^%{HOSTNAME:VirtualHost}

RESULT:

{
  "VirtualHost": [
    [
      "vhost.domain.com"
    ]
  ]
}

I change my /etc/logstash/conf.d/apache.logstash.conf file to:

input { stdin { }}

filter {
  if [type] == "apache" {
    grok {
      match => { "message" => '^%{HOSTNAME:VirtualHost}'}
    }

    date {
      match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
    }

  }
}
output {
  elasticsearch { hosts => ["localhost:9200"]}
  stdout { codec => rubydebug}
}

But i still receving the complete message JSON on kibana and the fild VirtualHost not appear:

{
  "_index": "logstash-2017.07.13",
  "_type": "log",
  "_id": "AV09Y0UqrEzNUGjySO03",
  "_score": null,
  "_source": {
    "@timestamp": "2017-07-13T19:18:20.520Z",
    "geoip": {},
    "offset": 936898297,
    "@version": "1",
    "beat": {
      "hostname": "internalproxy.domain.local",
      "name": "internalproxy.domain.local",
      "version": "5.1.1"
    },
    "input_type": "log",
    "host": "internalproxy.domain.local",
    "source": "/var/log/httpd/sistema/sistema_ssl_access_2017.07.13.log",
    "message": "vhost.domain.com 172.21.24.101 \"ajp://internalserver.domain.local\" - - [13/Jul/2017:16:18:19 -0300] \"POST /1g/Processo/update.seam HTTP/1.1\" 200 4345 \"https://vhost.domain.com/1g/Processo/update.seam?tab=assunto&idProcesso=1140754\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0\" 997735 1019 5026",
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied",
      "_grokparsefailure",
      "_geoip_lookup_failure"
    ]
  },
  "fields": {
    "@timestamp": [
      1499973500520
    ]
  },
  "sort": [
    1499973500520
  ]
}

After I put the grok filter should I change something in the kibana?

You only apply the grok filter to logs of type "apache", but this log clearly has the type "log".

I don't know what the _grokparsefailure tag comes from. You should investigate if you have any additional grok filters laying about, e.g. in another file in /etc/logstash/conf.d that you've forgotten about.

Magnus,

I remove the other logs from filebeats.yml and change:

filter {
  if [type] == "apache"

to

filter {
  if [type] == "log"

and the filter works.

Thank you very much.

Hi guys,

After receiving the data some Integer fields appear as string.
I had to make a change in my Patterns because some fields sometimes was filled in the log with the character "-".
To solve this I accessed the site https://www.debuggex.com/r/NQDc-7aOsI9yeSoa
and I created my INT2 ((?: [+-]? (?: [0-9] +))) | (-)

In my filter grok set that the field is INT2 but keep getting him as string in kibana.
I dont know if the regex is correct accept the caractere "-" as a integer but i dont know how replace the "-" to 0.

grok {
  match => { "message" => '%{INT2:Response_size:int}'}
}

Tried using the mutate and it didn't work.

mutate {
  convert => {"Response_size" => "integer" }
}

Why not just omit the field if there is no value? Replacing it with zero just because e.g. the response size is unavailable will lead to weird results if you attempt any numerical operations. This is what the stock patterns do:

(?:%{NUMBER:bytes}|-)

Either match a number and store it in bytes, or match a lone hyphen. (Of course, you'll want %{NUMBER:bytes:int} so it's converted to an integer.)

Magnus,

Its works!
Thank you very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.