Config file for multiple multi-line patterns?

1steve · May 25, 2021, 12:31am

It looks like the configs described here no longer work; Config file for multiple multiline patterns

There is now a codec for multiline inputs; Multiline codec plugin | Logstash Reference [7.12] | Elastic

input {
  stdin {
    codec => multiline {
        # lines starting with whitespace get appened to previous entry
      pattern => "^\s"
      what => "previous"
    }
  }
}

However, I need to add more pattern matches. In particular, I need to also parse a section of the log file (the tail end) that looks like this;

------------------------------------------------------------
Sender: LSF System
Subject: Job .... 

Job ...

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
cwltoil ...
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   13363.00 sec.
    Max Memory :                                 -
    Average Memory :                             -
    Total Requested Memory :                     2.00 GB
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              13
    Max Threads :                                28
    Run time :                                   13887 sec.
    Turnaround time :                            13889 sec.

The output (if any) is above this job summary.

Any ideas on how to set this up? I do not even need to grok it that much, just save it as a single event. In this case, this will always be the very last section of the log file, but the contents and length may vary.

Badger · May 25, 2021, 1:34am

You are right that the multiline filter is deprecated, or may even have been removed. The multiline codec in logstash, or multiline handling in filebeat are supported.

What tells you that the tail end of the file has started? You may need to do some of the multiline processing in the codec and some in an aggregate filter.

1steve · May 25, 2021, 2:53pm

This line;

------------------------------------------------------------

shows that the tail end of the log has started, and everything after that should be considered one entry.

On its own I do not think it would be that difficult to set the Logstash input multiline codec to detect this, the problem is that I already have that codec configured to detect tracebacks (indented lines) and bundle those as well. So I am not sure how to tell the multiline code to detect two different patterns. I looked into Filebeat's multiline handling, and could not figure out how to specify multiple multiline patterns on there either.

For context, this is because my log is actually the output of two different programs, the Toil CWL workflow runner, running inside of an LSF HPC job.

1steve · May 25, 2021, 5:06pm

Trying to get Filebeat involved does not seem to help either. For example, if I changed my configs to first ingest the log with Filebeat then ship to Logstash;

filebeat.yml

filebeat.inputs:
- type: stdin
  # concatenate append messages that start with whitespace
  multiline.type: pattern
  multiline.pattern: '^[[:space:]]'
  multiline.negate: false
  multiline.match: after

output.logstash:
  hosts: ["localhost:5044"]

This works in detecting the first set of multi-line entries (the indented stack traces and other error messages).

However, it does not appear to be possible to apply this for both sets of events. If I change it to this;

filebeat.inputs:
- type: stdin
  # concatenate append messages that start with whitespace
  multiline.type: pattern
  multiline.pattern: '^[[:space:]]'
  multiline.negate: false
  multiline.match: after
  # # detect LSF log messages at the end of the file
  multiline.type: pattern
  multiline.pattern: '^------------------------------------------------------------'
  multiline.negate: false
  multiline.match: after

output.logstash:
  hosts: ["localhost:5044"]

Nothing happens, in fact no multiline messages get matched at all.

If I try to change it to have separate multiline parsings like this;


filebeat.inputs:
- type: stdin
  # concatenate append messages that start with whitespace
  multiline.type: pattern
  multiline.pattern: '^[[:space:]]'
  multiline.negate: false
  multiline.match: after
- type: stdin
  # # detect LSF log messages at the end of the file
  multiline.type: pattern
  multiline.pattern: '^------------------------------------------------------------'
  multiline.negate: false
  multiline.match: after

I get the error;

2021-05-25T12:59:39.727-0400    ERROR   instance/beat.go:956    Exiting: stdin requires to be run in exclusive mode, configured inputs: stdin, stdin
Exiting: stdin requires to be run in exclusive mode, configured inputs: stdin, stdin

Additionally, I cannot use multiline in both Filebeat and Logstash simultaneously either. If I try to run multiline in Logstash like this;

input {
    beats {
        port => "5044"
        codec => multiline {
            # lines starting with whitespace get appened to previous entry
          pattern => "^\s"
          what => "previous"
        }
    }
}

I get Logstash errors like this;

[2021-05-25T13:04:28,988][ERROR][logstash.javapipeline    ][main] Pipeline error {:pipeline_id=>"main", :exception=>#<LogStash::ConfigurationError: Multiline codec with beats input is not supported. Please refer to the beats documentation for how to best manage multiline data. See https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html>

Not sure where to go from here. This seems like a pretty big oversight in ELK stack, since usage of multiple multiline filters in Logstash used to work. Thoughts?

Badger · May 25, 2021, 5:36pm

I have an incomplete solution in which I tried consuming the entire file as a single event.

file { path => "/home/user/foo.txt" sincedb_path => "/dev/null" start_position => beginning codec => multiline { pattern => "^Spalanzani" negate => true what => previous auto_flush_interval => 1 multiline_tag => "" } }

I use grok to divide into the job output and the job details

    grok { match => { "message" => "%{DATA:message2}------------------------------------------------------------%{GREEDYDATA:message2}" } }

[message2] will be an array with two members. We want these to be separate messages. This is what I ended up with

    clone { clones => [ "first", "second", "third" ] }
    if [type] == "first" {
        mutate { rename => { "[message2][0]" => "message" } }
    } else if [type] == "second" {
        mutate { rename => { "[message2][1]" => "message" } }
        # Avoids splitting on newline
        mutate { gsub => [ "message", "\n", "!" ] }
    } else if [type] == "third" {
        mutate { replace => { "message" => "DeleteMe" } }
    } else {
        drop {}
    }
    mutate { remove_field => [ "message2", "type" ] }

The first one ends up with all the lines from the first part of the file newline separated. When these go through the tcp output and input they will automatically get split into separate events.

The second one ends up with the job details

       "message" => "!Sender: LSF System!Subject: Job .... !Job ...!Your job looked like:!------------------------------------------------------------!# LSBATCH: User input!cwltoil ...!------------------------------------------------------------!Exited with exit code 1.!Resource usage summary:!    CPU time :                                   13363.00 sec.!    Max Memory :                                 -!    Average Memory :                             -!    Total Requested Memory :                     2.00 GB!    Delta Memory :                               -!    Max Swap :                                   -!    Max Processes :                              13!    Max Threads :                                28!    Run time :                                   13887 sec.!    Turnaround time :                            13889 sec.!The output (if any) is above this job summary."

The third one exists because if it does not then the tcp input never flushes the second. (This is the part that is somewhat broken.)

I then send them over tcp

output  { tcp { port => 3456 host => "localhost" codec => line { format => "%{message}" } } }

which is received in another pipeline

input { tcp { host => localhost port => 3456 codec => multiline { pattern => "^\s" what => "previous" auto_flush_interval => 2 } } }

If you wanted to you could

mutate { gsub => [ "message", "!", "
" ] }

in the second pipeline.

As I said, it is not really a solution, but it is as close as I could get. I do not understand why the tcp input is not flushing the last message it received (it only does so when logstash shuts down).

1steve · May 25, 2021, 6:04pm

Thanks this looks interesting, do you have an example of what the entire Logstash config file would look like in this case? I am a little unclear where each of these snippets goes.

Also, would this still work with input coming from Filebeat? I realized that in my production usage I will actually still be using Filebeat to ship the log to Logstash

Badger · May 25, 2021, 7:56pm

I tried it again using filebeat instead of a file input. filebeat is configured with

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /tmp/data/*.txt
    #- c:\programdata\elasticsearch\logs\*
  multiline.pattern: ^Monsieur Spalanzani
  multiline.negate: true
  multiline.match: after
  multiline.timeout: 5s
setup.template.settings:
  index.number_of_shards: 1
setup.kibana:
output.logstash:
  hosts: ["127.9.2.3:5044"]

The localhost network 127/8 has sixteen million addresses on it, and I like to use some the others from time to time (you may prefer 127.0.0.1).

I create files under /tmp/data that contain

foo.txt

java.lang.Exception: Stack trace
    at java.base/java.lang.Thread.dumpStack(Thread.java:1383)
    at com.example.stacktraces.StackTrace.d(StackTrace.java:23)

------------------------------------------------------------
Sender: LSF System
Subject: Job ....

Job ...

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
cwltoil ...
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   13363.00 sec.
    Max Memory :                                 -
    Average Memory :                             -
    Total Requested Memory :                     2.00 GB
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              13
    Max Threads :                                28
    Run time :                                   13887 sec.
    Turnaround time :                            13889 sec.

The output (if any) is above this job summary.

So filebeat is sending me events like

   "message" => "a.txt\n\njava.lang.Exception: Stack trace\n    at java.base/java.lang.Thread.dumpStack(Thread.java:1383)\n    at com.example.stacktraces.StackTrace.d(StackTrace.java:23)\n\n------------------------------------------------------------\nSender: LSF System\nSubject: Job .... \n\nJob ...\n\nYour job looked like:\n\n------------------------------------------------------------\n# LSBATCH: User input\ncwltoil ...\n------------------------------------------------------------\n\nExited with exit code 1.\n\nResource usage summary:\n\n    CPU time :                                   13363.00 sec.\n    Max Memory :                                 -\n    Average Memory :                             -\n    Total Requested Memory :                     2.00 GB\n    Delta Memory :                               -\n    Max Swap :                                   -\n    Max Processes :                              13\n    Max Threads :                                28\n    Run time :                                   13887 sec.\n    Turnaround time :                            13889 sec.\n\nThe output (if any) is above this job summary.",

My pipelines.yml looks like:

- pipeline.id: main
  pipeline.workers: 1
  config.string: >
    input { beats { host => "127.9.2.3" port => 5044 codec => plain } }
    filter {
        grok { match => { "message" => "%{DATA:message2}------------------------------------------------------------%{GREEDYDATA:message2}" } }
        clone { clones => [ "first", "second", "third" ] }
        if [type] == "first" {
            mutate { rename => { "[message2][0]" => "message" } }
        } else if [type] == "second" {
            mutate { rename => { "[message2][1]" => "message" } }
            # Avoids splitting on newline
            mutate { gsub => [ "message", "\n", "!" ] }
        } else if [type] == "third" {
            mutate { replace => { "message" => "DeleteMe" } }
        } else {
            drop {}
        }
        mutate { remove_field => [ "message2", "type" ] }
    }
    output { tcp { host => "127.4.3.2" port => 4080 codec => line { format => "%{message}" } } }

- pipeline.id: tcp
  config.string: >
    input {
        tcp {
            host => "127.4.3.2"
            port => 4080
            codec => multiline { pattern => "^\s" what => "previous" auto_flush_interval => 4 multiline_tag => ""}
        }
    }
    filter { if [message] == "DeleteMe" { drop {} } }
    output { stdout { codec => rubydebug { metadata => false } } }

That sometimes results in

   "message" => "vbn2.txt"
   "message" => ""
   "message" => "java.lang.Exception: Stack trace\n    at java.base/java.lang.Thread.dumpStack(Thread.java:1383)\n    at com.example.stacktraces.StackTrace.d(StackTrace.java:23)"
   "message" => "!Sender: LSF System!Subject: Job .... !!Job ...!!Your job looked like:!!------------------------------------------------------------!# LSBATCH: User input!cwltoil ...!------------------------------------------------------------!!Exited with exit code 1.!!Resource usage summary:!!    CPU time :                                   13363.00 sec.!    Max Memory :                                 -!    Average Memory :                             -!    Total Requested Memory :                     2.00 GB!    Delta Memory :                               -!    Max Swap :                                   -!    Max Processes :                              13!    Max Threads :                                28!    Run time :                                   13887 sec.!    Turnaround time :                            13889 sec.!!The output (if any) is above this job summary."

From time to time

   "message" => "bove this job summary."

gets broken off last part into a separate message. That appears to be happening in the multiline codec for reasons that baffle me. As does the need for the "DeleteMe" bodge.

1steve · May 27, 2021, 2:19pm

Thanks, I will look into this. Is handling of many different multi-line patterns in ELK stack supposed to be this difficult? Is there some other technique that would make this easier?

leandrojmp · May 27, 2021, 2:30pm

If you have multiple multi-line patterns in the same file, then yes, it is not easy.

But not for just logstash or filebeat, it is not easy for other tools in the market also.

Depending on the complexity of your log files, sometimes it is easier to write a tool to pre-process it and write the logs in a friendlier format for logstash/filebeat.

system · June 24, 2021, 2:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Solution for multiple patterns for multiline configuration Logstash	3	5940	July 6, 2017
Hello! I want to use multiline codec patterns in my use case.. Here I want to read the multiple lines as one event Logstash	2	230	September 13, 2019
Logstash 5.1 - Multiline codec with file input Logstash	3	1963	February 9, 2017
Multiline XML processing using logstash Logstash	7	1758	February 2, 2018
Multiline codec reading entire log file as one line Logstash	1	1189	June 20, 2017

Config file for multiple multi-line patterns?

Related topics