Logstash syslog input malformed haproxy grok lines

Hello,
I'll explain first what are the symptoms I am seeing, and then describe the attempts made to remedy that..

I'm using haproxy, and have configuered it to send it's logs, with the UDP protocol in syslog format (I know that a loosely termed protocol) to logstash 5.5.0;
logstash is configured to receive the UDP messages and it does so, however passing the message through the 'included grok filters', specifically the haproxy grok, it fails to parse the lines correctly..

on a closer inspection (output the line to both elasticsearch, and stdout), I have found why I'm seeing '_grokparsefailure' in elastic, and the lines are simple messages, and not a complete http request breakdown i want..

I'm just using the following settings*:
INPUT { udp { port => 6514 type => "syslog"} }

FILTER { if [type] == "syslog" { grok { match => {"message" -> "%{HAPROXYHTTP}"}}}}

OUTPUT { stdout {} elasticsearch { hosts=> "elastic:9200"}

*Note that these setting have been tweaked many times, those include using the syslog input alone, tcp&udp inputs without syslog, with & w/o format parameter;
also, haproxy is configured correctly to the non-standard syslog 6514 port..

Valid haproxy 1.5.18 syslog output lines look like this (captured with rsyslog and syslog-ng):

Apr 18 06:34:52 1.2.3.4 haproxy[320]: 112.1.2.3:58109 [18/Apr/2018:06:39:01.024] public be_http_cluster-health:router-checker/pod:router-checker-damenobset-c4d0x:router-checker-demoanset:10.130.2.71.8080 0/0/0/0/0/ 200 127 - - --NI 2/2/2/0/1/0 0/0 {router-checker-cluster-health.os-test.domain.com||} "GET / HTTP/1.1"

logstash 5.5.0 stdout:

2018-04-18T07:45:34.749Z 1.2.3.4 <142>Apr 18 06:34:52 haproxy[320]: 112.1.2.3:58109 [18/Apr/2018:06:39:01.024] public be_http_cluster-health:router-checker/pod:router-checker-damenobset-c4d0x:router-checker-demoanset:10.130.2.71.8080 0/0/0/0/0/ 200 127 - - --NI 2/2/2/0/1/0 0/0 {router-checker-cluster-health.os-test.domain.com||} "GET / HTTP/1.1"

Theres one main thing to note in the logstash output,
It is missing the ip|host in the message,

Instead of the following format getting into the grok filter (pseudo)

%date %host %program[%pid]: %http_client:%http_port [%date]

This one gets to the grok:

%timestamp %host <%syslog_priority>%date %program[%pid]: %http_client:%http_port [%date]

The hostname or IP of the syslog sender jumps from being part of the message (between the date format and the syslog program), to the outside of the message, right after the logstash message timestamp metadata..

the rest of the changes do not affect the grok filter, as they are simply not matched.

This causes the grok filter to fail on line 36

HAPROXYHTTP (?:%{SYSLOGTIMESTAMP:syslog_timestamp}|%{TIMESTAMP_ISO8601:timestamp8601}) %{IPORHOST:syslog_server} %{SYSLOGPROG}: %{HAPROXYHTTPBASE}

when removing the bold section from the filter the message parses correctly, but that's a hacky solution, as it modifies the builtin filter, is there a way to tell logstash not to modify the message in this way? (or keep the hostname inside the message as well as copy it to a metadata tag)?

You could use the lines I provided here on the grok debuggers:
https://grokdebug.herokuapp.com/
http://grokconstructor.appspot.com/do/match

and play with the hostname in the broken line
hopefully I've made my problem clear..

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.