Parsing plain text with Logstash filter

The log file is one line of plain text. There is part of text like this:

...there is some text here, Results: 0, Errors: 0, there is some text here...

The goal is to create new fields results and errors and assign proper values to them.

I think it can be achieved by doing something like this in Logstash filter section:

filter {
  grok {
    match => {"message" => "(?<results>Results:) (?<errors>Errors:)"}
  }
  mutate {
    add_field => { "results" => "%{results}" }
  }
  mutate {
    add_field => { "errors" => "%{Errors}" }
  }
}

Could anybody suggest the right way to solve that problem?

Hi

You could try:

filter {
    if [message] =~ /.*Results.*Errors.*/ {
        grok {
            match => { "message" => ".*Results\:\s(?<results>\d*).*Errors\:\s(?<Errors>\d*)"
         }
    }
}

That should get you fields and values for results and errors as strings. If you need those values as numbers just use Mutate's convert function.

@Kryten That didn't work.

The part of the message is:

...:, , Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, , [INFO]...

This is config:

filter {
  if [message] =~ /.*Failures.*Errors.*Skipped.*/ {
    grok {
      match => {"message" => ".*Failures\:\s(?<failures>\d*).*Errors\:\s(?<errors>\d*).*Skipped\:\s(?<skipped>\d*)"}
    }
  }
}

No new fields have been created.

@John_06

If you take the string you supplied as sample to begin with:

...there is some text here, Results: 0, Errors: 0, there is some text here...

and put it into the grok debugger here:-
https://grokdebug.herokuapp.com/

Then paste in the pattern I supplied:
.*Results\:\s(?<results>\d*).*Errors\:\s(?<Errors>\d*)

You get fields. LS would do the same.

If you then remove the event string from your first post and replace it with the event string from your last, it breaks. Naturally.

To parse the latest event string you provided:-
...:, , Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, , [INFO]...

You could use:
.*run\:\s(?<run>\d*).*Failures\:\s(?<failures>\d*).*Errors\:\s(?<errors>\d*).*Skipped\:\s(?<skipped>\d*).*\[(?<severity>\w*)

and that should yield:
{
"run": [
[
"1"
]
],
"failures": [
[
"0"
]
],
"errors": [
[
"0"
]
],
"skipped": [
[
"0"
]
],
"severity": [
[
"INFO"
]
]
}

Thanks. A little unclear how to handle the case when those variables are repeatable in log message.

E.g.:

...created, Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, there is some text here Results:, , Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, , [INFO]...

The grok filter expression:

.*run\:\s(?<run>\d*).*Failures\:\s(?<failures>\d*).*Errors\:\s(?<errors>\d*).*Skipped\:\s(?<skipped>\d*)

will create fields with two values: run 1, 1; failures 0, 0; errors 0, 0; skipped 0, 0.
Is it possible to get only only value for every field?

@magnusbaeck
Hi Magnus, do you have any thoughts why grok filter assigns to values to the same field?
Though everything works good in Grok debugger - it shows only one value for every field.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.