GROK Multiple Match - Logstash


(Dario Pezzi) #1

Dear All,
I'm looking for to match multiple patterns against a single field (message) but I can't understand how it works.

This is my multiple match defined inside filter:

 grok {
    patterns_dir => "/etc/logstash/patterns/"
    break_on_match => false
    keep_empty_captures => true
    
    match => {"message" => ["(%{EXIM_DATE:exim_date} )(%{EXIM_MSGID:exim_msg_id} )(?<msg_c>Completed)",
                            "(%{EXIM_DATE:exim_date} )(%{EXIM_MSGID:exim_msg_id} )(?<msg_f>frozen)"
                           ]
             }
 }

Logs file to analyze:

2015-08-21 06:39:43 1ZSe7D-0001VB-Ir Completed
2015-08-21 17:16:01 1ZSntD-0002yr-CA Message is frozen
2015-08-19 06:16:01 Start queue run: pid=21180

Assumption: EXIM_DATE and EXIM_MSGID works correctly (definition inside external file)

Question number 1
In the case of first log I get all fields and a grokfailure and it sound ok for me. The grok failure is determinated by the option break_on_match => false that will try to resolve the second pattern (actually if I remove the option break_on_match I get the resuts without grokfailure)

2015-08-21 06:39:43 1ZSe7D-0001VB-Ir Completed
{
        "message" => "2015-08-21 06:39:43 1ZSe7D-0001VB-Ir Completed ",
       "@version" => "1",
     "@timestamp" => "2015-08-22T05:48:40.788Z",
           "host" => "ubuntu",
      "exim_date" => "2015-08-21 06:39:43",
      "exim_year" => "2015",
     "exim_month" => "08",
       "exim_day" => "21",
      "exim_time" => "06:39:43",
    "exim_msg_id" => "1ZSe7D-0001VB-Ir",
          "msg_c" => "Completed",
           "tags" => [
        [0] "_grokparsefailure"
    ]
}

With the second log I get only grokfailure without any fields in the output (exim_date, exim_msg_id) and this sound strange. I would expect to have the same result as the firs log. The first pattern fail but the second pattern should be ok.

2015-08-21 17:16:01 1ZSntD-0002yr-CA Message is frozen
{
       "message" => "2015-08-21 17:16:01 1ZSntD-0002yr-CA Message is frozen",
      "@version" => "1",
    "@timestamp" => "2015-08-22T05:50:59.789Z",
          "host" => "ubuntu",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

The question is how it works? Why I get the second output without any fields?

Question number 2
Suppose my match is only

  match => {"message" => ["(%{EXIM_DATE:exim_date} )(%{EXIM_MSGID:exim_msg_id} )(?<msg_c>Completed)"]}

In this case if the pattern fail for exim_msg_id but works for exim_date logstash will exit the filter without any field in output. Thus it work all or nothig. Correct?

Thanks in advance for your help.

Dario


(Magnus Bäck) #2

Question 1:

The question is how it works? Why I get the second output without any fields?

That's because

(%{EXIM_DATE:exim_date} )(%{EXIM_MSGID:exim_msg_id} )(?<msg_f>frozen)

doesn't match this string:

2015-08-21 17:16:01 1ZSntD-0002yr-CA Message is frozen

Try this instead:

%{EXIM_DATE:exim_date} %{EXIM_MSGID:exim_msg_id} Message is (?<msg_f>frozen)

I also removed he unnecessary parentheses. Not sure why you use different fields (msg_c and msg_f) depending on the message's status. Why not store the strings in a single field (e.g. msg_status)?

Question 2:

In this case if the pattern fail for exim_msg_id but works for exim_date logstash will exit the filter without any field in output. Thus it work all or nothig. Correct?

Yes. The whole grok expression must match.


(Dario Pezzi) #3

Very thanks.


(lucifer) #4

Can I apply multiple match statements inside grok? Suppose I want to check the beginning and the end of a string and am not concerned with what's in the middle. How do I do that?


(system) #5