Haproxy 1.6 + logstash 2.2; grok vs. online validators

I'm having a bit of a head banging against the wall issue... I have haproxy (v. 1.6) config'ed to syslog to logstash (v. 2.2.2). Grok's packaged haproxy filters are for v. 1.4- so i've had to hack them up a bit, plus add some matches on the front end to take into account the syslog bits.

What I have is pretty simple. The on-line grok validators are happy with my creation and parse everything I throw at them just fine. Logstash... not so much. I consistently get _grokparsefailure errors.

Here's an example log line:

<134>Mar 2 23:39:47 haproxy[1]: [02/Mar/2016:23:39:46.380] haproxy-container~ server_notes/noteswiki 280/0/0/355/635 302 453 - - ---- 1/1/0/1/0 0/0 "GET /w/index.php?title=AWS_KMS&diff=695&oldid=694&mobileaction=toggle_view_mobile HTTP/1.1"

And my grok match:

%{SYSLOG5424PRI}%{SYSLOGTIMESTAMP:syslog_timestamp} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{IP:client}:%{INT:port} [%{HAPROXYDATE:accept_date}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{INT:time_request}/%{INT:time_queue}/%{INT:time_backend_connect}/%{INT:time_backend_response}/%{NOTSPACE:time_duration} %{INT:http_status_code} %{NOTSPACE:bytes_read} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{NOTSPACE:termination_state} %{INT:actconn}/%{INT:feconn}/%{INT:beconn}/%{INT:srvconn}/%{NOTSPACE:retries} %{INT:srv_queue}/%{INT:backend_queue} \"(|(%{WORD:http_verb} (%{URIPROTO:http_proto}://)?(?:%{USER:http_user}(?::[^@]*)?@)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:http_request})?( HTTP/%{NUMBER:http_version})?))?\"

What am I missing?!


Do you really have backslash-escaped double quotes in the input log strings? Like in the very last couple of characters in the logs message.

Always post regular expressions, configuration snippets etc formatted as code. Otherwise the forum software will ignore leading whitespace and strip certain things, leading to confusion and time-wasting "don't you actually have a backslash before your square brackets?" questions.

Yes, unfortunately the incoming string really does have quotes, which subsequently need to be escaped. I've not run through a tcp cap to see where the trailing double quote are coming from (they are not in haproxy's httplog format). Perhaps they are being injected someplace else... I'm not sure.

However, after some deeper consultation with myself at the gym, I think I could tackle this problem better and write a custom log format in haproxy to output in JSON. I think that may work better.