Drop log messages that don't contain absolute path to application data

Hi Forum,

The application I am working with outputs several log messages as below. The message logged by the application displays the root path until the absolute path where the application resides in reverse order. I would like to keep only the message on the top as it contains the absolute path to the application data and drop the other messages below as they are not relevant. How can I achieve such thing using logstash ? I am new to ELK and any help would be really appreciated.

[02:26:59 9.5.2016] submitted:/testsuite/ACCESS_prod/sge/rtds4/complete/alter
[02:26:59 9.5.2016] submitted:/testsuite/ACCESS_prod/sge/rtds4/complete
[02:26:59 9.5.2016] submitted:/testsuite/ACCESS_prod/sge/rtds4
[02:26:59 9.5.2016] submitted:/testsuite/ACCESS_prod/sge
[02:26:59 9.5.2016] submitted:/testsuite/ACCESS_prod
[02:26:59 9.5.2016] submitted:/testsuite
[02:26:59 9.5.2016] submitted:/

Thank you.

Is there anything unique about /testsuite/ACCESS_prod/sge/rtds4/complete/alter compared to the other paths? Is it a file rather than a directory? Does it contain any particular files that aren't found elsewhere?

Hi Magnus,

This is the application absolute path. It's a directory and contains a
different variety of ascii files that are genareted by the application. The
other messages containing the directories above the app path can de
discarted and I would like to know how can I do that using a regexp in
logstash.

Thank you.

You could write a ruby filter to do something like this:

  • If the message begins with "submitted:/",
  • extract the path,
  • check if the files contains a particular set of files,
  • if not, delete the event.

Most of this stuff could be done with standard filters, but checking for the existence of files needs to be done with a ruby filter.

This is not quite the way I want to achieve this. To use a ruby filter to
analyse every single line in a ~ 20M log file will hamper the server
performance significantly.
Is there any way to use the multiline filter plugin to put all the lines
into a single array then chop all the other indexes in the array except the
first one ?

How many 20 MB log files do you need to process per second?

Doing it with a multiline codec might work. If the current line matches (if we ignore the timestamp prefix for the purpose of this example) submitted:/[a-z], join with the next line?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.