Grok parse failure in LogStash, but works in Kibana debugger

I have a pattern that the Grok Debugger in Kibana says works. Testing gives me the output I expect. However when run in Logstash I get a failure.

Here is the log message I am trying to parse:

DEBUG: 12/17/18 00:01:42.696 PID=4310 (cbaldslTL1d 1000 25100)\nMonitorNbr:40070768 WorkNbr:5867 Op:RFRSH_PHONE DirNum:7702423620 DevId: SubId:500001-123456\nTL1Client::TL1Client(), HOST=127.0.0.1/127.0.0.1\n"

Here is my filter configuration, it should be matching on Triad Format 2

    filter {

        mutate {
            gsub => [
                # Replace the line of dashes
                "message", "-+$", ""
            ]
        }

        # Handle Vision360 Logs
        if [fields][log_type] == "vision360"{
            grok {
                match => {
                    "message" => "%{DATE_US} %{TIME}  %{WORD:log_level}"
                }
            }
        } else if [fields][log_type] == "triad-current-msg" {
            #
            # Triad Log Format 1
            #
            grok {
                match => {
                    "message" => "%{WORD:log_level}: %{DATE_US}.* %{TIME}.* %{WORD}.*=%{NUMBER:pid} \(%{WORD:process_name} %{NUMBER:controller_number} %{NUMBER:instance_number}"
                }
            }
            #
            # Triad Log Format 2
            #
            grok {
                match => {
                    "message" => "%{WORD:log_level}: %{DATESTAMP} PID=%{NUMBER:pid} \(%{WORD:process_name} %{NUMBER:controller_number} %{NUMBER:instance_number}\)\\nMonitorNbr:%{NUMBER:monitor_number} WorkNbr:%{NUMBER:work_number} Op:%{WORD:operation} DirNum:%{NUMBER:directory_number} DevId:(%{WORD:device_id})? SubId:%{WORD:subscriber_id}"
                }
            }
        }
    }

Here is the input message from the Logstash log

[2018-12-20T13:07:46,927][DEBUG][logstash.pipeline        ] filter received {"event"=>{"tags"=>["beats_input_codec_plain_applied"], "host"=>{"architecture"=>"x86_64", "os"=>{"family"=>"redhat", "platform"=>"cent
os", "codename"=>"Final", "version"=>"6.10 (Final)"}, "containerized"=>true, "name"=>"yellow"}, "offset"=>1829, "log"=>{"flags"=>["multiline"]}, "prospector"=>{"type"=>"log"}, "source"=>"/home/eti/logtest/triad-
current-msg-format.log", "@timestamp"=>2018-12-20T18:08:02.913Z, "input"=>{"type"=>"log"}, "fields"=>{"log_type"=>"triad-current-msg"}, "beat"=>{"version"=>"6.5.3", "name"=>"yellow470.dev.etisoftware.local", "ho
stname"=>"yellow"}, "message"=>"DEBUG: 12/17/18 00:01:42.696 PID=4310 (cbaldslTL1d 1000 25100)\nMonitorNbr:40070768 WorkNbr:5867 Op:RFRSH_PHONE DirNum:7702423620 DevId: SubId:500001-123456\nTL1Client::TL1Client(
), HOST=127.0.0.1/127.0.0.1\n------------------------------------------------------------------------------", "@version"=>"1"}}

Here is the output message:

[2018-12-20T13:07:46,941][DEBUG][logstash.pipeline        ] output received {"event"=>{"tags"=>["beats_input_codec_plain_applied", "_grokparsefailure"], "host"=>{"architecture"=>"x86_64", "os"=>{"family"=>"redha
t", "platform"=>"centos", "codename"=>"Final", "version"=>"6.10 (Final)"}, "containerized"=>true, "name"=>"yellow"}, "offset"=>1829, "log"=>{"flags"=>["multiline"]}, "instance_number"=>"25100", "prospector"=>{"t
ype"=>"log"}, "pid"=>"4310", "log_level"=>"DEBUG", "source"=>"/home/eti/logtest/triad-current-msg-format.log", "@timestamp"=>2018-12-20T18:08:02.913Z, "process_name"=>"cbaldslTL1d", "controller_number"=>"1000", 
"input"=>{"type"=>"log"}, "fields"=>{"log_type"=>"triad-current-msg"}, "beat"=>{"version"=>"6.5.3", "name"=>"yellow470.dev.etisoftware.local", "hostname"=>"yellow"}, "message"=>"DEBUG: 12/17/18 00:01:42.696 PID=
4310 (cbaldslTL1d 1000 25100)\nMonitorNbr:40070768 WorkNbr:5867 Op:RFRSH_PHONE DirNum:7702423620 DevId: SubId:500001-123456\nTL1Client::TL1Client(), HOST=127.0.0.1/127.0.0.1\n", "@version"=>"1"}}

If I used the Kibana Debugger I get this as my output:

{
  "instance_number": "25100",
  "log_level": "DEBUG",
  "pid": "4310",
  "monitor_number": "40070768",
  "controller_number": "1000",
  "work_number": "5867",
  "directory_number": "7702423620",
  "subscriber_id": "500001",
  "process_name": "cbaldslTL1d",
  "operation": "RFRSH_PHONE"
}

I have tried reversing the order of the Grok definitions but I get the same result.

You're getting the fields you want (pid, controller_number, and so on) so the grok filter is working. Perhaps you have another grok filter in your configuration that's failing?

Thanks for the speedy response. I might be misunderstanding how to configure Grok. I would expect given my configuration that the first Grok would be skipped because it does not match the message. The second Grok would process the message because it matches.

But it seems, given the actual output that the first is matching and not the second. The second should produce all the fields I show in the output from the Kibana debugger because the message contains all that data.

To further narrow this down. I have updated my test. I am now only sending one message from Filebeat to Logstash. I only have one pattern configured for Grok.

This is the message received by LogStash

[2018-12-20T14:27:29,775][DEBUG][logstash.pipeline        ] filter received {"event"=>{"host"=>{"name"=>"yellow", "os"=>{"codename"=>"Final", "version"=>"6.10 (Final)", "family"=>"redhat", "platform"=>"centos"}, "architecture"=>"x86_64", "containerized"=>true}, "@version"=>"1", "source"=>"/home/eti/logtest/triad-current-msg-format.log", "prospector"=>{"type"=>"log"}, "fields"=>{"log_type"=>"triad-current-msg"}, "message"=>"DEBUG: 12/17/18 00:01:42.696 PID=4310 (cbaldslTL1d 1000 25100)\nMonitorNbr:40070768 WorkNbr:5867 Op:RFRSH_PHONE DirNum:7702423620 DevId: SubId:500001-123456\nTL1Client::TL1Client(), HOST=127.0.0.1/127.0.0.1\n------------------------------------------------------------------------------", "log"=>{"flags"=>["multiline"]}, "offset"=>0, "tags"=>["beats_input_codec_plain_applied"], "@timestamp"=>2018-12-20T19:27:45.305Z, "input"=>{"type"=>"log"}, "beat"=>{"hostname"=>"yellow", "version"=>"6.5.3", "name"=>"yellow470.dev.etisoftware.local"}}}

This is the Grok filter configuration

filter {

    mutate {
        gsub => [
            # Replace the line of dashes
            "message", "-+$", ""
        ]
    }

    # Handle Vision360 Logs
    if [fields][log_type] == "vision360"{
        grok {
            match => {
                "message" => "%{DATE_US} %{TIME}  %{WORD:log_level}"
            }
        }
    } else if [fields][log_type] == "triad-current-msg" {
        grok {
            match => {
                "message" => [
                    "%{WORD:log_level}: %{DATESTAMP} PID=%{NUMBER:pid} \(%{WORD:process_name} %{NUMBER:controller_number} %{NUMBER:instance_number}\)\\nMonitorNbr:%{NUMBER:monitor_number} WorkNbr:%{NUMBER:work_number} Op:%{WORD:operation} DirNum:%{NUMBER:directory_number} DevId:(%{WORD:device_id})? SubId:%{WORD:subscriber_id}"
                ]
            }
        }
    }
}

This is the output in the Logstash log

[2018-12-20T14:27:30,009][DEBUG][logstash.pipeline        ] output received {"event"=>{"host"=>{"name"=>"yellow", "os"=>{"codename"=>"Final", "version"=>"6.10 (Final)", "family"=>"redhat", "platform"=>"centos"}, "architecture"=>"x86_64", "containerized"=>true}, "@version"=>"1", "source"=>"/home/eti/logtest/triad-current-msg-format.log", "prospector"=>{"type"=>"log"}, "fields"=>{"log_type"=>"triad-current-msg"}, "message"=>"DEBUG: 12/17/18 00:01:42.696 PID=4310 (cbaldslTL1d 1000 25100)\nMonitorNbr:40070768 WorkNbr:5867 Op:RFRSH_PHONE DirNum:7702423620 DevId: SubId:500001-123456\nTL1Client::TL1Client(), HOST=127.0.0.1/127.0.0.1\n", "log"=>{"flags"=>["multiline"]}, "offset"=>0, "tags"=>["beats_input_codec_plain_applied", "_grokparsefailure"], "@timestamp"=>2018-12-20T19:27:45.305Z, "input"=>{"type"=>"log"}, "beat"=>{"hostname"=>"yellow", "version"=>"6.5.3", "name"=>"yellow470.dev.etisoftware.local"}}}

If I copy the input message and paste it into the Kibana Grok debugger, ad then I copy the pattern definition and paste it into the Kibana Grok debugger, I get the result I expect. Given that no fields are being tokenized, and that the output message has "_grokparsefailure" I understand that this pattern is not working.

I dont understand why its not working though. Especially considering that it works in the debugger.

Ok maybe Grok doesn't work the way I thought it did. After some poking around Stackoverflow I tried putting each field as a separate match in array of matches, and told Grok to not stop on the first match.

It appears to work reasonably well, but I have no idea if the actual right approach or how efficient it is.

filter {

    mutate {
        gsub => [
            # Replace the line of dashes
            "message", "-+$", ""
        ]
    }

    # Handle Vision360 Logs
    if [fields][log_type] == "vision360"{

        grok {
            match => {
                "message" => "%{DATE_US} %{TIME}  %{WORD:log_level}"
            }
        }

    } else if [fields][log_type] == "triad-current-msg" {
   
        grok {
            break_on_match => false
             match => {
                "message" => [
                    "^%{WORD:log_level}",
                    "Op:%{WORD:operation}",
                    "SubId:%{WORD:subscriber_id}",
                    "DevId:%{WORD:device_id}",
                    "DirNum:%{WORD:directory_number}",
                    "WorkNbr:%{NUMBER:work_number}",
                    "MonitorNbr:%{NUMBER:monitor_number}",
                    "PID=%{NUMBER:pid}",
                    "\(%{WORD:process_name} %{NUMBER:controller_number} %{NUMBER:instance_number}\)"
                ]
            }
        }
    }
}

Here is what I think my final configuration should look like.

input {
  beats {
    port => 5044
  }
}

filter {

    # Handle Vision360 Logs
    if [fields][log_type] == "vision360"{

        grok {
            match => {
                "message" => "%{DATE_US} %{TIME}  %{WORD:log_level}"
            }
        }

    } else if [fields][log_type] == "triad-current-msg" {

        mutate {
            gsub => [
                # Replace the line of dashes
                "message", "-+$", ""
            ]
        }

        grok {
            break_on_match => false
             match => {
                "message" => [
                    #
                    # Header 1
                    "^%{WORD:log_level}: %{DATESTAMP} PID=%{NUMBER:pid} \(%{WORD:process_name}\)",
                    #
                    # Header 2
                    "^%{WORD:log_level}: %{DATESTAMP} PID=%{NUMBER:pid} \(%{WORD:process_name} %{NUMBER:controller_number}\)",
                    #
                    # Header 3
                    "^%{WORD:log_level}: %{DATESTAMP} PID=%{NUMBER:pid} \(%{WORD:process_name} %{NUMBER:controller_number} %{NUMBER:instance_number}\)",
                    #
                    # Keywords
                    "Op:(%{WORD:operation})?",
                    "SubId:(%{WORD:subscriber_id})?",
                    "DevId:(%{WORD:device_id})?",
                    "DirNum:(%{WORD:directory_number})?",
                    "WorkNbr:(%{NUMBER:work_number})?",
                    "MonitorNbr:(%{NUMBER:monitor_number})?",
                    "Function=(%{WORD:function})?",
                    "Stmt=(%{GREEDYDATA:statement})?"
                ]
            }
        }
    }
}

output {
    elasticsearch {
      hosts => "localhost:9200"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.