Cannot parse logs - problem with multiline parse failures

Hi All,
I am having a lot of problems parsing logs especially with different dates and logs having multiline tags.

For example:
A head (very first 10 lines) of one of my log files, specifically, catalina.out, could be:

NOTE: Picked up JDK_JAVA_OPTIONS:  --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
[0.001s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/path/to/gc.log instead.
12-Sep-2023 20:24:49.876 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server] failed to set property [debug] to [0]
[root@DRACO4614424 config]#
12-Sep-2023 20:24:49.921 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Connector] failed to set property [debug] to [0]
12-Sep-2023 20:24:50.231 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Tomcat-Standalone]
12-Sep-2023 20:24:50.231 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.76]
etc
etc 
...
..
...This next 4 lines I want to go into 1 logline (multiline) until the next line starting with a date.
12-Sep-2023 20:24:51.848 INFO [Catalina-utility-2] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

12-Sep-2023 20:24:51.877 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log 1 Spring WebApplicationInitializers detected on classpath
12-Sep-2023 20:24:52.041 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log Initializing Spring DispatcherServlet 'dispatcher'

Here you can see some lines start with a date and some don't !

I guess the lines that don't start with a date have to be dropped.

I want each line that starts with the date to slurp all other lines that follow it into the same line until another line starts with a date, so:

In order to do that, I have to use the multiline codec, so I tried this, which does work elsewhere, just not here:

input {
  file {
    path => "/path/to/logs/catalina.*"
    start_position => "beginning"
    codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
    }
  }
}

With the following filter:

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\s+%{GREEDYDATA:message}" 
    }
  }
    if "_grokparsefailure" in [tags] {
          drop { }
    }

 #dissect {
       #  mapping => { "message" => "%{[@metadata][timestamp]} %{+[@metadata][timestamp]} %{message}" }
     #}
     #if "_dissectfailure" in [tags] {
     #     drop { }
     #}

    #if "_dateparsefailure" in [tags] {
      #    drop { }
    #}
}

output {
        file {
        path => "/path/to/log.out"
        }
}

whether I use the grok or dissect
The pipeline starts but no output file created...!
The input file does get data coming in every few seconds, I have checked with a tail.

Really confused.
Please tell me what's the best way to parse these kind of multiple line logs using logstash?

I strongly suspect it's something to do with the date parsing though I might be wrong.

Appreciate all your help,
Thanks again...

TIMESTAMP_ISO8601 requires a MONTHNUM, not a MONTH, so it cannot parse "Sep". You could use pattern => "^\d{2}-%{WORD}-\d{4} %{TIME}" in the multiline codec and

grok { pattern_definitions => { "MyTimestamp" => "\d{2}-%{WORD}-\d{4} %{TIME}" } match => { "message" => "%{MyTimestamp:timestamp}\s+%{GREEDYDATA:message}" } }

Note that [message] will be an array since you are not setting the overwrite option on the grok filter.

2 Likes

Hello Badger, Thanks but though I am now getting output, it's not line by line.

I read the doc link but didn't understand it much.

Now All lines get wrapped into 1 big line and there are duplicates in the same line. It's a bit wierd..

I tried with and also without the overwrite option.

input {
  file {
    path => "/tmp/lines.log"
    start_position => "beginning"
    codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
    }
  }
}

filter {
    grok { pattern_definitions => { "MyTimestamp" => "\d{2}-%{WORD}-\d{4} %{TIME}" } match => { "message" => "%{MyTimestamp:timestamp}\s+%{GREEDYDATA:message}" } overwrite => [ "message" ] }

    date {
       match => [ "[@metadata][timestamp]", "dd-MMM-yyyy HH:mm:ss.SSS" ]
       target => "@timestamp"
     }

    if "_grokparsefailure" in [tags] {
          drop { }
    }

    if "_dateparsefailure" in [tags] {
          drop { }
    }
}

output {
                file {
                path => "/tmp/logstash.out"
                }
}

Basically, if there are like 3 lines in a log file like below,
where let's say the 2nd line is a multiline of 100 lines or so,
then I would like to see 3 lines outputted as well.

The 2nd line should wrap up all the 100 lines below, until it reaches the 3rd line.

Some example input having 3 such lines:

12-Sep-2023 20:24:49.921 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Connector] failed to set property [debug] to [0]
12-Sep-2023 20:25:33.836 WARNING [Catalina-utility] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-exec-something-3 url: /rest/filter/1.0/service/etc; user: someuser (id=[nnn12]) has been active for [71,125] milliseconds (since [9/12/23 7:20 AM]) to serve the same request for [https://some.url.com/rest/table-filter/] and may be stuck (configured thresh
old for this StuckThreadDetectionValve is [60] seconds). There is/are [1] thread(s) in total that are monitored by this Valve and may be stuck.
        java.lang.Throwable
                at java.base@11.0.17/java.net.SocketInputStream.socketRead0(Native Method)
                at java.base@11.0.17/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
                at java.base@11.0.17/java.net.SocketInputStream.read(SocketInputStream.java:168)
                at java.base@11.0.17/java.net.SocketInputStream.read(SocketInputStream.java:140)
               +more lines
12-Sep-2023 20:26:36.565 WARNING [http-exec-6 url: /some/url/filter/1.0/; user: someone] com.sun.jersey.spi.container.servlet.WebComponent.filterFormParameters A servlet request, to the URI https://some.url.com/rest/filter/1.0/...defmeb2ajfgv8usdfuj53b41d&_=1694389812926, contains form parameters in the request body but the request body has been consumed by the servlet or a servlet filter accessing the request parameters. Only resource methods using @ParamForm will work as expected. Resource methods consuming the request body by other means will not work as expected.

Could something like that be easily achieved if I try something else?
Thanks v much again.

You need to change this as said in my previous post. You may also want to add the auto_flush_interval option on the codec.

Thanks Badger, sorry I missed that completely.

It works now - Great!
Really appreciate your help.
-Danmed

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.