Picking data out of logs with Logstash

Hi,

I'm pretty new to Logstash, and I'm trying to evaluate it to see if it will do what I'd like to do.

I'm starting with logs that are partially log4j, but also include lines that aren't log4j, so a straight log4j filter won't work. I don't care about most of the information in log4j format anyway.

I want to do a number of things that would be pretty easy to accomplish with grep, but I can't tell if I'm thinking about them properly for using Logstash:

  • Pluck the numeric values out of a line like "Success on 567 records and failure on 34" and associate them with keys.
  • I have a bunch of lines like " = " - I'd like to associate those in my eventual output as "statistic" => "value". I can't tell if this is something for grok or for the kv filter.
  • Count the number of log4j ERROR lines in the file (is this an aggregate filter use case, or can you do this with metric somehow?)
  • Count the number of different types of log4j errors

Right now I've created a filter (which I've included) where I'm trying to grok out two different types of lines. That part works ok. I'm also trying to change the line where I'm groking out the statistic and value fields separately and combine them into one field. That part doesn't work.

filter {
  grok {
    match => { "message" => [
      "\s*%{WORD:statistic} = %{WORD:value}",
      "Success on %{NUMBER:successRecords} records and failure on %{NUMBER:failRecords}."
    ]}
  }
  if [statistic] {
    grok {
      add_field => { "%{statistic}" => "%{value}" }
      remove_field => [ "statistic", "value" ]
    }
  }
}

I could probably get this to work eventually, but what I really want to know is if I'm going about this all wrong to start with, and if I'm just not understanding how and whether I should use Logstash for this problem I'm trying to approach.

Would greatly appreciate thoughts of more experienced people!

  • Pluck the numeric values out of a line like "Success on 567 records and failure on 34" and associate them with keys.

That's a perfect job for grok.

  • I have a bunch of lines like " = " - I'd like to associate those in my eventual output as "statistic" => "value". I can't tell if this is something for grok or for the kv filter.

Um, could you give an example of what you mean?

  • Count the number of log4j ERROR lines in the file (is this an aggregate filter use case, or can you do this with metric somehow?)
  • Count the number of different types of log4j errors

Logstash's file input treats a file as a continuous stream of data, not as a immutable file that's read all at once. Therefore, counting the occurrences of a particular string in a particular file doesn't really make sense. What are you really after? Because errors per file doesn't seem like a useful metric. Is it perchance errors per unit of time that you're looking for?

In general, aggregations are best done outside of Logstash. If you pass the parsed logs to Elasticsearch it'll happily count errors and what not.

Thanks for the thoughts!

Examples for the second point:

"failedOnProcedureA" = "34"
"failedOnProdecureB" = "150"

etc. I've wound up pulling those lines out with grok:

"%{WORD:statistic} = %{WORD:value}"

and later using mutate:

if [statistic] {
  mutate {
    add_field => { "%{statistic}" => "%{value}" }
    remove_field => [ "statistic", "value" ]
  }
}

Which seems to do what I want in a way that I like.

The log files I'm looking at are for particular jobs - there is a new file per job. It's entirely possible I could get by on errors per unit of time though. Thanks for pointing it out.