Logstash Mutliline Inquiry

Hello,

I would like to ask a question on the Multiline function of Logstash.

How does it work?

When does it "compile" the pattern into the event?

I noticed that during execution, the events are singular and when it is done executing. When it reaches the end of the File, it compiles what you specified into the Singular event?

I am asking this, because when I run my configuration Locally (which is faster than on the Server) it immediately shows the Lines compiled into one event.

So, what I am asking is. When does multiline compile the lines into one singular event?

I suspect it is once there are no more updates to the File. If not, why?

Thanks,

It's hard to understand what you're asking. The multiline filter/codec will publish events containing multiple lines as soon as it can, i.e. as soon as it has read the first line of the next event (well, the exact behavior depends on the multiline configuration). When Logstash reaches the end of the file things get trickier since there is no next event to wait for so Logstash could end up waiting forever. If you enable the codec's auto_flush_interval option it'll send whatever it has collected if nothing happens for a while.

2 Likes

Hi Magnus!

Yeah, I think I have an english deficiency.

Anyway, let me try Again.

When logstash is still outputting the Lines, and you view it in Kibana. The lines are singular events.

multiline {    
    pattern => "\[(?<tslice>%{DATE_EU} %{TIME}) GMT]"
    what => "next"
}

As my pattern above, if it has a Timestamp pattern then it is a new event. Else it is a singular event. So new lines are specified by the datestamps.

But your suggestion of the auto_flush_interval is right. I think that is what I need, but how do I implement it?
I placed it in the multiline but I keep getting an error when Compiling, I also tried in the codec at the output part of the configuration.

multiline {
pattern => "[(?%{DATE_EU} %{TIME}) GMT]"
auto_flush_interval => '1'
what => "next"
}

codec => rubydebug {
auto_flush_interval => '1'
metadata => "true"
}

Thanks,

Are you using a multiline codec or a multiline filter?

Please always copy/paste any error messages. Don't just say "I get an error".

1 Like

Hello Magnus,

I am using a Multiline filter and not a multiline codec.

I do not have a copy of the error anymore, for I think I am using it correctly right now.

multiline {
pattern => "[(?%{DATE_EU} %{TIME}) GMT]"
periodic_flush => true
what => "next"
}

I noticed that the problem when I run it locally, it is displaying correctly because my local copy of LS is version 2.3.2 while our servers version is 2.0.0.

How do I use the multiline filter if I am going to use LS 2.0.0?

Thanks,

How do I use the multiline filter if I am going to use LS 2.0.0?

The periodic_flush option seems to be available in LS 2.0 so it's surprising that it doesn't work. The multiline filter is being deprecated so I suggest you switch to the codec.

Would why you run an old version of Logstash? Note that you can upgrade plugins separate from the rest of Logstash.

1 Like

Hello Magnus,

Well we started of at 2.0.0 when we tested it out. But sure, I think best way is too upgrade to the latest.

I did not know that for the Plugins, will keep that capability of the Plugins in mind.

Thanks,

Hello Magnus,

I have applied the codec and it works. However there is a problem, it is not an error.

The trigger is a series of XML tags. Now these tags are numerous, like 3000. So yeah, an Out of Memory is inevitable if I do not limit them.

The thing that happens is, when I encounter the multiline_codec_max_lines_reached. Every subsequent event, is marked as multiline_codec_max_lines_reached.

When the XML Tags end, the next line starts with a Datestamp. Yet it is still treated as one line so it keeps reaching the max_lines.

Thanks,

Please example input lines and your current configuration.

1 Like

Hello Magnus,

Here are some sample Lines:

[7/12/16 9:32:11:830 GMT] 000001f8 traceLogServi I
TraceLogMessage : 20160531-162719: The process to generate the SSC output file has started...
[7/12/16 9:32:11:998 GMT] 0000c6c5 BCSSCTriggerP I
Started checking the scheduled configuration to trigger the SSC output file generation...
[7/12/16 9:32:11:998 GMT] 0000c6c5 BCSSCTriggerP I
Setting the next schedule to null

The XML example I have is too long to post here, and I see that only image files can be Uploaded. Do you have an email I can send the .txt file with the XML Example? Or is there a way to get this to you here in the forum?

Below is my configuration:
input {

    file {
        path => "C:/elk/monitor/cdt3/logs/SystemOut*.log"
        type => "systemout"
        codec => multiline {
            pattern => "\[(?<tslice>%{DATE_EU} %{TIME}) GMT\]"
            negate => "true"
            what => "previous"
            multiline_tag => "multi_tagged"
        }
        add_field => { "env" => "CDT3"}
    }
    
}

Thanks,

You don't need to post the full XML file but I need to understand the structure of the input. Your multiline configuration looks correct.

1 Like

Hi Magnus,

Sure, here is just the first 5 Lines.

I guess I could've just set the max to 5 so that I could trigger the error.

[7/12/16 16:44:19:449 GMT] 00008348 SystemOut     O <?xml version="1.0" encoding="UTF-8"?>
<p:CurrencyExchangeRate xsi:type="p:CurrencyExchangeRate" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://com.ibm.sprint/bc/CurrencyExchangeRate">
  <exchangedate>2016-07-12</exchangedate>
  <fileTimeStamp> </fileTimeStamp>
  <sequenceIdentifier>0000002915</sequenceIdentifier>

Thanks,

Okay. I don't understand what the problem is. If I raise max_lines sufficiently Logstash happily joins all lines into a single message. Here's your codec configuration but with max_lines added:

$ cat test.config
input {
  stdin {
    codec => multiline {
      pattern => "\[(?<tslice>%{DATE_EU} %{TIME}) GMT\]"
      negate => "true"
      what => "previous"
      multiline_tag => "multi_tagged"
      max_lines => 4000
    }
  }
}
output { stdout { codec => rubydebug } }

Here's the base input:

$ cat data
[7/12/16 16:44:19:449 GMT] 00008348 SystemOut     O <?xml version="1.0" encoding="UTF-8"?>
<p:CurrencyExchangeRate xsi:type="p:CurrencyExchangeRate" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://com.ibm.sprint/bc/CurrencyExchangeRate">
  <exchangedate>2016-07-12</exchangedate>
  <fileTimeStamp> </fileTimeStamp>
  <sequenceIdentifier>0000002915</sequenceIdentifier>

So let's add 3000 more XML elements:

$ for i in $(seq 1 3000) ; do echo '<foo></foo>' >> data ; done
$ wc -l data
3005 data

Now let's pass this data twice to Logstash (to avoid the problem with the last multiline message not getting picked up):

$ cat data data | /opt/logstash/bin/logstash -f test.config
Settings: Default pipeline workers: 2
Pipeline main started
{
    "@timestamp" => "2016-07-20T12:30:43.805Z",
       "message" => "[7/12/16 16:44:19:449 GMT] 00008348 SystemOut     O <?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<p:CurrencyExchangeRate xsi:type=\"p:CurrencyExchangeRate\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:p=\"http://com.ibm.sprint/bc/CurrencyExchangeRate\">\n  <exchangedate>2016-07-12</exchangedate>\n  <fileTimeStamp> </fileTimeStamp>\n  <sequenceIdentifier>0000002915</sequenceIdentifier>\n<foo></foo>\n<foo></foo>\n<foo></foo>[ omitting thousands of empty XML tags ]<foo></foo>",
      "@version" => "1",
          "tags" => [
        [0] "multi_tagged"
    ],
          "host" => "hallonet"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

Looks okay, no?

1 Like

Hello Magnus,

Yes it does look Good. So, yeah thing is my XML Tags reach to about 136,905 Lines.

I used Notepad ++ to count the Lines.

So, is it safe to declare 150,000 as MAX_LINES? Because with that volume I think an Out of Memory is inevitable. I would like to set it to 1,000 but once an event exceeds the 1,000 Lines. All subsequent Lines will be crumped up to a single event. Even though it does not.

Thanks,

You have a 150k line XML file in your logs? If you want to parse that with Logstash it should be possible but you may have to bump its heap size. It depends on the memory characteristics of the XML parser and how long each line is.

2 Likes

Hello Magnus,

Sure, I think we could increase our heap size.

But on your end, when your data exceeds MAX_LINES does the next event get tagged as MAX_LINES_REACHED even though it shouldnt? Because that is what is happenning on my end.

Example Below:

For the above example, I made the MAX_LINES to 10. This example comes after my XML Event. Now, thing is it exceeded the MAX_LINES even though it clearly does not. For it fulfills the pattern and should have been segregated into seperate events in accordance to the Timestamp pattern.

So once an event exceeded the MAX_LINES, every subsequent event WILL exceed the MAX_LINES. Even though it should not.

Thanks,

That's not what I'm seeing with Logstash 2.3.4. If I run cat data data data | /opt/logstash/bin/logstash -f test.config I'm getting two messages to stdout, both tagged multi_tagged but not multiline_codec_max_lines_reached.

1 Like