Filebeat / Haproxy : Grok in pipeline need to have both request and response headers captured to parse them

Hi

I've been testing Filebeat (8.6.2) to collect logs generated by HAProxy 2.2. The logs are directly sent to Elasticsearch, and treated by the haproxy pipeline setup by Filebeat.

In my Haproxy setup, I only capture request header (http host). No response header captured.

Sadly, it seems the GROK pattern in there require that some headers are captured in both request and response as they are in the same ()? group.

The original Grok rule deployed by Filebeat :

(%{NOTSPACE:process.name}\[%{NUMBER:process.pid:long}\]: )?(%{IP:source.address}|-):%{NUMBER:source.port:long} \[%{NOTSPACE:haproxy.request_date}\] %{NOTSPACE:haproxy.frontend_name} %{NOTSPACE:haproxy.backend_name}/%{NOTSPACE:haproxy.server_name} (%{IPORHOST:destination.address} )?%{NUMBER:haproxy.http.request.time_wait_ms:long}/%{NUMBER:haproxy.total_waiting_time_ms:long}/%{NUMBER:haproxy.connection_wait_time_ms:long}/%{NUMBER:haproxy.http.request.time_wait_without_data_ms:long}/%{NUMBER:temp.duration:long} %{NUMBER:http.response.status_code:long} %{NUMBER:haproxy.bytes_read:long} %{NOTSPACE:haproxy.http.request.captured_cookie} %{NOTSPACE:haproxy.http.response.captured_cookie} %{NOTSPACE:haproxy.termination_state} %{NUMBER:haproxy.connections.active:long}/%{NUMBER:haproxy.connections.frontend:long}/%{NUMBER:haproxy.connections.backend:long}/%{NUMBER:haproxy.connections.server:long}/%{NUMBER:haproxy.connections.retries:long} %{NUMBER:haproxy.server_queue:long}/%{NUMBER:haproxy.backend_queue:long} (\{%{DATA:haproxy.http.request.captured_headers}\} \{%{DATA:haproxy.http.response.captured_headers}\} |\{%{DATA}\} )?"%{GREEDYDATA:haproxy.http.request.raw_request_line}"

Example haproxy log line :

Mar 27 15:50:56 hostname haproxy[26830]: 192.0.2.42:50938 [27/Mar/2023:15:50:56.108] external my-backend/my-server 0/0/8/3/11 200 419 - - ---- 1/1/0/0/0 0/0 {example.net} "GET / HTTP/1.1"

Would there be a way to work with only request xor response header captured ?

Maybe we could split that ()? group in two. But, in that case, if you chose to only capture some response headers.... they would be labeled as request header as we seems to have no way of knowing whether its a request or response header. So... not a good thing.

An alternative would be :

  • If headers are captured in both request and response, we can properly identify them, and label them
  • If headers are captured in request XOR response, then we could label them in a neutral way, as we can't know whether it's a request or response header.

I don't have a good solution on that... maybe you may have better ideas.

But that would be cool that headers captured only in request or response are parsed, and not thrown away.

Good day

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.