Logstash _grokparsefailure in same pattern of data

Hi ,

The strange thing is happenning in logstash. For the same format of log, it gets parsed quite easily however sometimes its throwing grokparsefailure .
Pattern:

  grok {
    match => { "message" => '%{WORD:logtitle} %{SYSLOGBASE2} %{USERNAME:username} START %{WORD:bctid} %{GREEDYDATA:bctname} %{UUID:eventid} %{GREEDYDATA:jsondata}'}
    add_tag => [ "taskStarted" ]
  }
  grok {
    match => { "message" => '%{WORD:logtitle} %{SYSLOGBASE2} %{USERNAME:username} END %{WORD:bctid} %{GREEDYDATA:bctname} %{UUID:eventid} %{GREEDYDATA:jsondata}'}
    add_tag => [ "taskTerminated" ]
  }

data which gets parsed:

OFF 2019-06-11 21:09:11.693 NAV-C20AS01P a227410 START 61064_1_1 Login 65fb0deb-801e-4c8f-ac43-2c84844ed1ed {"entity_id":"","simulated_property":"","selected_be":"242","total_items":"","visit_id":"","alertbox_name":"","selected_family":"","selected_report":""}

data which has issue:

OFF 2019-06-12 10:15:22.236 NAV-C20AS01P z028465 START 61064_1_1 Login 453b0cf5-67fe-4acf-8b2a-6d2f0af527bb {"entity_id":"","simulated_property":"","selected_be":"3","total_items":"","visit_id":"","alertbox_name":"","selected_family":"","selected_report":""}

Error is as follows:

{
                      "eventid" => "453b0cf5-67fe-4acf-8b2a-6d2f0af527bb",
                  "selected_be" => "3",
                "timestamp8601" => "2019-06-12 10:15:22.236",
                       "source" => "D:\\NavettiPricePoint.VolvoGroup\\Applications\\Host\\App_Data\\Logs\\TRACE-BusinessNavigatorLog.txt",
             "Availability_thr" => 10.0,
                  "total_items" => "",
              "Performance_thr" => 5.0,
                     "@version" => "1",
                         "beat" => {
            "name" => "NAV-C20AS01P",
        "hostname" => "NAV-C20AS01P",
         "version" => "6.1.0"
    },
                         "host" => "NAV-C20AS01P",
                "alertbox_name" => "",
           "simulated_property" => "",
                       "offset" => 2337262,
                   "prospector" => {
        "type" => "log"
    },
                      "message" => "OFF 2019-06-12 10:15:22.236 NAV-C20AS01P z028465 START 61064_1_1 Login 453b0cf5-67fe-4acf-8b2a-6d2f0af527bb {\"entity_id\":\"\",\"simulated_property\":\"\",\"selected_be\":\"3\",\"total_items\":\"\",\"visit_id\":\"\",\"alertbox_name\":\"\",\"selected_family\":\"\",\"selected_report\":\"\"}",
                    "logsource" => "NAV-C20AS01P",
                         "tags" => [
        [0] "beats_input_codec_plain_applied",
        [1] "taskStarted",
        [2] "_grokparsefailure"
    ],
                   "@timestamp" => 2019-06-12T10:15:22.236Z,
    "avg_elapsed_time_per_item" => [
        [0] NaN
    ],
                      "bctname" => "Login",
          "elapsed_time_in_Sec" => 0.0,
                  "Environment" => "Prod",
                     "visit_id" => "",
                     "logtitle" => "OFF",
                        "bctid" => "61064_1_1",
                     "username" => "z028465",
     "calculation_elapsed_time" => 0.0
}

the input matches the 1st grok filter, but then fails to match the 2nd grok filter.

I would advise generalising them into a single pattern. As always, we recommend anchoring your pattern (prefixing it with the beginning-of-line anchor ^), and advise caution when using GREEDYDATA surrounded by spaces, as it can cause backtracking in the generated parser (which leads to performance problems -- if you can be more specific, please do so).

  grok {
      pattern_definitions => {
        "TASKACTION" => "START|END"
      }
      match => {
        "message" => '^%{WORD:logtitle} %{SYSLOGBASE2} %{USERNAME:username} %{TASKACTION:task_action} %{WORD:bctid} %{GREEDYDATA:bctname} %{UUID:eventid} %{GREEDYDATA:jsondata}'}
    }
  }

The above will save either START or END to the task_action field, which likely obviates your need for the separate tag.


You can also specify exactly what tag(s) to add when the input fails to match with the tag_on_failure directive (which takes an array of strings) and explicitly give the input an id instead of letting one be auto-generated. These are helpful when you have multiple grok filters, so you can determine which one is failing or taking up too much CPU time.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.