Reading multiline data from third line

falafel · June 11, 2015, 7:32pm

I have a log file which contains lines which begin with a timestamp. Optional lines (an uncertain number of them) might follow each such timestamped line:

SOMETIMESTAMP some data
optional more data 1 2
optional other information 3 4

I can parse the optional lines into variables if I know how many of them there are. For example, if I know there are two optional lines, the grok filter below will work. But what should I do if I don't know how many optional lines will exist? Say I want to get the 1 and the 2, and the 3 and the 4, and any other such examples that might occur, from an uncertain number of optional lines...

Second question: Also, even if I know I will only have 2 optional lines, is this filter below the best way to access them?

filter {
        multiline {
            pattern => "^%{SOMETIMESTAMP}"
            negate => "true"
            what => "previous"
        }
        
        if "multiline" in [tags] {
            grok {
                match => { "message" => "(?m)^%{SOMETIMESTAMP} %{DATA:firstline}(?<newline>[\r\n]+)%{DATA:secondline}(?<newline>[\r\n]+)%{DATA:thirdline}$" }
            }
        }
        # After this would be grok filters to process the contents of 'firstline', 'secondline', and 'thirdline'. I would then remove these three temporary fields from the final output.
}

michaellizhou · June 12, 2015, 3:47pm

I am not sure if it is optimal but I usually leave multiline in the input. And it works well for me. That way when you get to filter you can start doing some of the real work.

Now for the optional lines:
Why do you want to first treat the multiline as a single message then parse them into separate lines? Why not keep them as single messages and put them through your filter?

falafel · June 12, 2015, 6:34pm

I separated the lines since it seemed to me that if I separated the lines into different variables, I could do additional pattern matching on the contents of the lines separately, without having to refer to the entire pattern all over again. (For example, based on the contents of the first line, I might want to present branching behavior for the other lines.)

Of course, that wouldn't be the desired behavior when I could have a theoretically unlimited number of optional lines, so I would ideally want some mechanism to parse each line one-by-one and add the extracted information (like the numbers 1, 2, 3, and 4 above) to some kind of list. However, right now, I can only get the contents of the lines as a single combined 'message' field. Is there some way to get the lines one-by-one while still keeping them as part of a single event?

michaellizhou · June 15, 2015, 1:54pm

This is a troubling issue that your trying to fix. But if you really want to read the lines one-by-one try setting the filter so that when you read in the new line character "\n" this will be the end of the line. That way you still have your large combined 'message' and the single lines you wanted to further filter through. Keeping them as part of a single event? Can you explain this statement some more? I do not fully understand what you mean by keeping it a single event? Could you not just throw some tags on the messages that way you have a way to reference them?

falafel · June 15, 2015, 2:13pm

By keeping them as part of a single event, I mean storing all the information parsed from a single log entry in a single JSON payload (which is sent via HTTP). So the data obtained from my log entry (the data being the values 1, 2, 3, and 4) would be sent as a single JSON message, rather than being split up into separate JSON messages. Anyway, the multiline filter is supposed to take care of this stuff, and it is doing so. But it's also making it hard to parse individual lines for data.

I haven't had any luck trying to deal with a variable number of supplementary log lines. For simplicity's sake, say that these supplementary lines all follow the same format. So if I had two such lines, my log file would read:

SOMETIMESTAMP intro message
optional line 1 2
optional line 3 4

Given that, would there be an effective way to somehow get access to the 1, 2, 3, and 4, and store them in an array or something?