Logstash _grokparsefailure in same pattern of data

Hi ,

The strange thing is happenning in logstash. For the same format of log, it gets parsed quite easily however sometimes its throwing grokparsefailure .
Pattern:

  grok {
    match => { "message" => '%{WORD:logtitle} %{SYSLOGBASE2} %{USERNAME:username} START %{WORD:bctid} %{GREEDYDATA:bctname} %{UUID:eventid} %{GREEDYDATA:jsondata}'}
    add_tag => [ "taskStarted" ]
  }
  grok {
    match => { "message" => '%{WORD:logtitle} %{SYSLOGBASE2} %{USERNAME:username} END %{WORD:bctid} %{GREEDYDATA:bctname} %{UUID:eventid} %{GREEDYDATA:jsondata}'}
    add_tag => [ "taskTerminated" ]
  }

data which gets parsed:

OFF 2019-06-11 21:09:11.693 NAV-C20AS01P a227410 START 61064_1_1 Login 65fb0deb-801e-4c8f-ac43-2c84844ed1ed {"entity_id":"","simulated_property":"","selected_be":"242","total_items":"","visit_id":"","alertbox_name":"","selected_family":"","selected_report":""}

data which has issue:

OFF 2019-06-12 10:15:22.236 NAV-C20AS01P z028465 START 61064_1_1 Login 453b0cf5-67fe-4acf-8b2a-6d2f0af527bb {"entity_id":"","simulated_property":"","selected_be":"3","total_items":"","visit_id":"","alertbox_name":"","selected_family":"","selected_report":""}

Error is as follows:

{
                      "eventid" => "453b0cf5-67fe-4acf-8b2a-6d2f0af527bb",
                  "selected_be" => "3",
                "timestamp8601" => "2019-06-12 10:15:22.236",
                       "source" => "D:\\NavettiPricePoint.VolvoGroup\\Applications\\Host\\App_Data\\Logs\\TRACE-BusinessNavigatorLog.txt",
             "Availability_thr" => 10.0,
                  "total_items" => "",
              "Performance_thr" => 5.0,
                     "@version" => "1",
                         "beat" => {
            "name" => "NAV-C20AS01P",
        "hostname" => "NAV-C20AS01P",
         "version" => "6.1.0"
    },
                         "host" => "NAV-C20AS01P",
                "alertbox_name" => "",
           "simulated_property" => "",
                       "offset" => 2337262,
                   "prospector" => {
        "type" => "log"
    },
                      "message" => "OFF 2019-06-12 10:15:22.236 NAV-C20AS01P z028465 START 61064_1_1 Login 453b0cf5-67fe-4acf-8b2a-6d2f0af527bb {\"entity_id\":\"\",\"simulated_property\":\"\",\"selected_be\":\"3\",\"total_items\":\"\",\"visit_id\":\"\",\"alertbox_name\":\"\",\"selected_family\":\"\",\"selected_report\":\"\"}",
                    "logsource" => "NAV-C20AS01P",
                         "tags" => [
        [0] "beats_input_codec_plain_applied",
        [1] "taskStarted",
        [2] "_grokparsefailure"
    ],
                   "@timestamp" => 2019-06-12T10:15:22.236Z,
    "avg_elapsed_time_per_item" => [
        [0] NaN
    ],
                      "bctname" => "Login",
          "elapsed_time_in_Sec" => 0.0,
                  "Environment" => "Prod",
                     "visit_id" => "",
                     "logtitle" => "OFF",
                        "bctid" => "61064_1_1",
                     "username" => "z028465",
     "calculation_elapsed_time" => 0.0
}

the input matches the 1st grok filter, but then fails to match the 2nd grok filter.

I would advise generalising them into a single pattern. As always, we recommend anchoring your pattern (prefixing it with the beginning-of-line anchor ^), and advise caution when using GREEDYDATA surrounded by spaces, as it can cause backtracking in the generated parser (which leads to performance problems -- if you can be more specific, please do so).

  grok {
      pattern_definitions => {
        "TASKACTION" => "START|END"
      }
      match => {
        "message" => '^%{WORD:logtitle} %{SYSLOGBASE2} %{USERNAME:username} %{TASKACTION:task_action} %{WORD:bctid} %{GREEDYDATA:bctname} %{UUID:eventid} %{GREEDYDATA:jsondata}'}
    }
  }

The above will save either START or END to the task_action field, which likely obviates your need for the separate tag.


You can also specify exactly what tag(s) to add when the input fails to match with the tag_on_failure directive (which takes an array of strings) and explicitly give the input an id instead of letting one be auto-generated. These are helpful when you have multiple grok filters, so you can determine which one is failing or taking up too much CPU time.