Multiline message with repeated header


I have a rather tricky situation. Multiline messages are sent through some transportation service which adds its own header. Due to policy (and technical) issues it's not possible to install Filebeat on the originating system.

What Filebeat reads looks like this:

12:34, service1, 124, This is a single line message, extradata123
12:36, service1, 138, This is a, extradata145
12:36, service1, 138, multline message, extradata145

So the multiline message is packed between other columns so I can't use the usual multiline solutions in Filebeat. Please mind that the other columns are identical in these multline messages. There is some sort of id or hash (represented by 124 and 138 in my example).

Before I discovered that there's this "special" form of multiline message I just used the csv Filter in Logstash and all was fine and cozy. Now I found these multiline messages and honestly I don't know that to do.

The data is written to a file which I collect with a bash script which pipes the data into filebeat. (I know, there are other ways but there are other reasons why I have to use the script). I'm just giving this information because I wouldn't mind solving this in a bash/python/ruby/assembler tool before piping it into filebeat.

Maybe you could give me a hint how to solve it? (Hint: Beating the developers of the originating tool until they send proper multiline messages might be fun but is not an option)


I wonder if you could solve this in Logstash using the aggregate filter: