Multiline

Hi
Sorry if I disturb you again but I but I'm totally out of ideas and I really need advice from the community...

I'm trying to parse a multiline log like this one from file.

[Tue Apr 19 10:31:27 2016] [debug] ssl_engine_io.c(1929): OpenSSL: read 48/48 bytes from BIO#7f4d88078d70 [mem: 7f4d880a4bb8] (BIO dump follows)
[Tue Apr 19 10:31:27 2016] [debug] ssl_engine_io.c(1862): +-------------------------------------------------------------------------+
[Tue Apr 19 10:31:27 2016] [debug] ssl_engine_io.c(1901): | 0000: f5 50 2d 1f b3 70 ae 82-c4 61 b1 07 be dc 30 ad .P-..p...a....0. |
[Tue Apr 19 10:31:27 2016] [debug] ssl_engine_io.c(1901): | 0010: df fc f4 b4 93 4e 32 52-4b f6 d2 be 66 6b 77 08 .....N2RK...fkw. |
[Tue Apr 19 10:31:27 2016] [debug] ssl_engine_io.c(1901): | 0020: 7c 27 c1 df 0b 1c eb f1-3b 54 3a 52 96 d6 1e 11 |'......;T:R.... |
[Tue Apr 19 10:31:27 2016] [debug] ssl_engine_io.c(1907): +-------------------------------------------------------------------------+

The number of line of this type of log are not constant (from 4 to 120) and also the file contains many other different single line logs (that I have alredy filtered).

So my idea (but I'm not sure if this is possible) is to filter the line code first (all this log have the same - 1929) and use it as pattern for the multiline.
Then, I need to cicle a filter for each 1901 code line until I don't find the end line (line code 1907).

So, the final question is, can I create something like that using logstash?

These line numbers could change in each version of OpenSSL so you should pick some other way of recognizing the start and end of the interesting message. Not sure what that would be, though.

Thank you for you reply and for the info about the line number.
As I said as in a previous post I have just start with this job so sometimes of the times I have no idea what I'm doing...

Thanks again.

The text (BIO dump follows) might be usable.
We don't know what the rest the data looks like so we can't advise on a suitable pattern that will match on single line and multiline sections.

Thank you for you reply.

I reach to create a patter that (more or less) work.
What I receive back is this (a small one as example):

[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1929): OpenSSL: read 160/160 bytes from BIO#7f4d34009760 [mem: 7f4d340110c8] (BIO dump follows)\n[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1901): | 0000: 38 7a 22 46 5b 0c 32 af-65 45 5b 5a e3 a1 ed 13 8z"F[.2.eE[Z.... |\n[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1901): | 0010: 99 ba d8 99 23 c3 a4 b3-b8 77 61 4c ae 79 08 c6 ....#....waL.y.. |\n[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1901): | 0020: 22 a7 c2 e5 bc 1a 02 a9-6a b8 7a d0 1c 1a 5e ca ".......j.z...^. |\n[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1907): +-------------------------------------------------------------------------+\n

Original message :

[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1929): OpenSSL: read 160/160 bytes from BIO#7f4d34009760 [mem: 7f4d340110c8] (BIO dump follows)
[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1901): | 0000: 38 7a 22 46 5b 0c 32 af-65 45 5b 5a e3 a1 ed 13 8z"F[.2.eE[Z.... |
[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1901): | 0010: 99 ba d8 99 23 c3 a4 b3-b8 77 61 4c ae 79 08 c6 ....#....waL.y.. |
[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1901): | 0020: 22 a7 c2 e5 bc 1a 02 a9-6a b8 7a d0 1c 1a 5e ca ".......j.z...^. |
[Tue Apr 19 10:27:30 2016] [debug] ssl_engine_io.c(1907): +-------------------------------------------------------------------------+

And now my problem is... How to filter something like that???
For the first line there are no problems.
For the other...I have no idea about how to procede.
As I said, there are not a regular amount line and for for create something well structured I need to save the element in 2 field (hexadecimal = 99 ba d8 99 23 c3 a4 b3-b8 77 61 4c ae 79 08 c6) and (ashi = ....#....waL.y.. )

There is way to do that?

Many thanks again

It is doable I think. The biggest concern is that there are multiple hex and ascii sets in each multiline message field. It will be quite trial and error.

You need to add fields to the event before a grok filter. In the file input add (I don't know whether this works.)

add_field => { "hex" => [] "ascii"=> []}

In your grok filter add a custom pattern.

filter {
  grok {
    match => { "message" => ":\s(?<hex>(([0-9a-fA-F]{2})(\s|\-)){16})((?<ascii>(.{16}))\s\|)" }
    break_on_match => false
  }
}

How I think it works...
We add two fields that have empty arrays, hex and ascii.
As grok applies the matched text it sees that the hex or ascii field is an array and appends the matched text to each one.
So...

hex -> ["38 7a 22 46 5b 0c 32 af-65 45 5b 5a e3 a1 ed 13", "99 ba d8 99 23 c3 a4 b3-b8 77 61 4c ae 79 08 c6", "22 a7 c2 e5 bc 1a 02 a9-6a b8 7a d0 1c 1a 5e ca"]

ascii -> ["8z\"F[.2.eE[Z....", "....#....waL.y..", "\".......j.z...^."]

Really many thanks for your reply, it help me a lot! My main trouble was exactly how to apply a filter to multiline log without a regular number of string.

Now I'm still working on the multiline pattern, trying to create one that read the log from the begin to the end (I suppose is the only way to proceed because there are many types of log inside the same file).
So I create this one (is a test, it's not complete obviously):

"^([Tue Apr .(BIO dump follows)\n)([Tue Apr .+-+\n)([Tue Apr .|.|\n){1,}([Tue Apr .+-*+\n)"

still not operate properly but I'm working on it.

Thanks again for your help

Sorry if I bother you with another question but I still don't get how to create a proper setting for a case like this... I don't know, maybe it is impossible filter for this particular case or I'm simply tard...

I need to set a begin and an end point for the pattern because in the same file there are may type of logs so If I place a begin logstash simply don't know when he must stop and vice versa if I place only the pattern of the last line it don't know where the multiline log begin!

So I create a regex (created and tested using regex buddy with ruby codec) for all pattern

^(\[Tue\sApr\s.*\(BIO.*)\r\n(\[Tue\sApr .*\+--*\+)\r\n(\[Tue\sApr\s.*\s\|.*\|?\r\n){1,}(\[Tue\sApr .*\+--*\+)$

But in this way logstah is not able to filter nothing and add logs until it reach the number limit...

Some advice for a poor noob like me?

And as always, many thanks.

Please post more lines.
I need to see a lot of lines before and after the lines you posted.

Sadly, the multiline codec does not support begin and end patterns yet, this may arrive in next major LS release - no promises.

This make me sad and happy at the same time...
Sad because I can't finish to parse all the file... Happy because I'm not so tard after all! :smiley:

I thank you so much again for your help and for your help.

About the other line of logs don't worry, my was mostly a test to check if logstash was able to parse something of so complex from a single file/stream.
I have already create a full filter for system log, apache, message log and log4j... that was my main purpose.

And anyway I can also filter all the debug log from apache (except for the multiline obviously) and this is a huge step ahead from analyse them from raw message :slight_smile:

Thanks again