Using Multiline Codec and Grok to collect similar lines

Hi,

I'm trying to parse HTTP logs, and I want to collect all of the request headers into a single field. I've used the multiline codec to put all of the lines of a single HTTP request/response into a single message.

I have the following custom grok patterns:

RESPONSE_VALUE [^\n]*
HTTP_REQUEST_HEADER HTTP request header *%{WORD:header_directive}: %{RESPONSE_VALUE:header_value}
HTTP_RESPONSE_HEADER HTTP response header *%{WORD:header_directive}: %{RESPONSE_VALUE:header_value}

I then have the following grok filter:

grok {
match => { "message" => "%{HTTP_REQUEST_HEADER:request_header}" }
break_on_match => false
}

I get the first header as I want it to come out. All of the others are dropped.

Is there a way to either collect all of these lines into one field? Or into an ordered list?

What do the input messages look like?

"message":"HTTP GET (544.47ms) http://myserver.com:80/Services/WS/audiences\nHTTP request header Accept: application/json\nHTTP request header Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\nHTTP request header User-Agent: Ruby\nHTTP request header Connection: close\nHTTP request header Host: myserver.com\nResponse status Net::HTTPOK (200)\nHTTP response header Date: Wed, 13 Jan 2016 01:36:52 GMT\nHTTP response header Set-Cookie: JSESSIONID=6B301E213D3B3D7A0938C67F6E54CD98; Path=/Services; HttpOnly\nHTTP response header Content-Type: application/json\nHTTP response header Connection: close\nHTTP response header Transfer-Encoding: chunked\nResponse body {\n "key"\ : "value" \n}"

I'm actually surprised you get a match since grok usually stops at the first newline. Have you tried matching multiple HTTP_REQUEST_HEADER values?

%{HTTP_REQUEST_HEADER:request_header}\n%{HTTP_REQUEST_HEADER:request_header}

Your example above produces the results I want for the first two headers:

"request_header":["HTTP request header  Connection: close","HTTP request header  Host: myserver.com"]

It even breaks the subparsing the way I'd like it:

"header_directive":["Connection","Host"],"header_value":["close","myserver.com"]

Of course my goal is to pull 0 or more. I've tried:

(%{HTTP_REQUEST_HEADER:request_header}\n)*

I've also tried setting the grok pattern to:

HTTP_REQUEST_HEADER HTTP request header *%{WORD:header_directive}: %{RESPONSE_VALUE:header_value}\n

And then trying:

%{HTTP_REQUEST_HEADER:request_header}*

These don't pull any results at all.

How about this?

%{HTTP_REQUEST_HEADER:request_header}(\n%{HTTP_REQUEST_HEADER:request_header})*

That only returns the first result.

I went back to your previous test because I noticed something odd.

Here's the header portion of the message:

\nHTTP request header  Accept: application/json\nHTTP request header  Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\nHTTP request header  User-Agent: Ruby\nHTTP request header  Connection: close\nHTTP request header  Host: myserver.com\n

When I used:

%{HTTP_REQUEST_HEADER:request_header}\n%{HTTP_REQUEST_HEADER:request_header}

I would get the Connection: close and Host: myserver.com headers, which are the last two headers. But anything returning a single result would return Accept: application/json

I went back and looked at my regex for the header directive and realized it was wrong:

%{WORD}

Doesn't accept hyphens. I redefined it:

 HEADER [A-Za-z0-9_-]*
 RESPONSE_VALUE [^\n]*
 HTTP_REQUEST_HEADER HTTP request header *%{HEADER:header_directive}: %{RESPONSE_VALUE:header_value}

This time your previous example returned Accept: application/json and Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3, which are the first two headers.

I went back excitedly to you most current example

%{HTTP_REQUEST_HEADER:request_header}(\n%{HTTP_REQUEST_HEADER:request_header})*

and now I get the first and the last: Accept: application/json and Host: myserver.com

Just to keep testing, I brute-forced this particular request:

%{HTTP_REQUEST_HEADER:request_header}\n%{HTTP_REQUEST_HEADER:request_header}\n%{HTTP_REQUEST_HEADER:request_header}\n%{HTTP_REQUEST_HEADER:request_header}\n%{HTTP_REQUEST_HEADER:request_header}

and I successfully get all 5 headers: Accept: application/json, Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3, User-Agent: Ruby, Connection: close, Host: myserver.com

That verifies that the %{HTTP_REQUEST_HEADER:request_header} is successfully capturing the elements of the header. There is something about combining it with \n and * that is not working. I tried putting the \n in the regex, but that breaks everything.