According to haproxy, the standard httplog truncates the final field if it's over 1024 characters long (from the manual):
"http_request" is the complete HTTP request line, including the method,
request and HTTP version string. Non-printable characters are encoded (see
below the section "Non-printable characters"). This is always the last
field, and it is always delimited by quotes and is the only one which can
contain quotes. If new fields are added to the log format, they will be
added before this field. This field might be truncated if the request is
huge and does not fit in the standard syslog buffer (1024 characters). This
is the reason why this field must always remain the last one.
My default filter: filter { if [type] == "haproxy-access" { grok { match => ["message", "%{HAPROXYHTTP}"] } } }
is failing on any truncated line with a _grokparsefailure. It works fine on non-truncated input. Any suggestions on how to customized the HAPROXYHTTP pattern to fix this?
You will notice, just as stated in the Haproxy documentation I quoted above, that if the last field is over 1024 characters, haproxy truncates it (and doesn't even put the finishing quote around the field). I'm looking into custom logging on the haproxy side, but this will affect everyone who has request strings > 1024.
Even IE8 supports up to 2000 character URI strings, but haproxy only logs the first 1024 characters (as per their documentation quoted above) You'll notice that haproxy doesn't even put the final quote in to complete the filed, it just stops printing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.