Grok filter succeeds in unit test, in grok debugger, but fails in production

aadamovich · June 2, 2016, 12:50pm

Hi!

I have the following filter in Logstash:

filter { 
  if [type] == "weblogic" {
    multiline {
      pattern      => "^####"
      negate       => true
      what         => previous
    }
    grok {
      match => {
        "message"  => [ 
          '####<(?<timestamp>\w{3}\s+\d{1,2}, \d{4} \d{1,2}:\d{2}:\d{2} \w{2}) \w+> <%{DATA:severity}> <%{DATA:subsystem}> <%{DATA:machine}> <%{DATA:server}> <%{DATA:thread}> <%{DATA:user}> <%{DATA:diagnostic}> <%{DATA:transaction}> <%{DATA:raw_time}> <%{DATA:message_id}> <%{GREEDYDATA:message_text}' 
        ] 
      } 
      keep_empty_captures => true
    }
    mutate {
      gsub         => [ "message_text", ">\s*$", ""]
    }
    mutate {
      gsub         => [ "user", "[<>]", ""]
    }
    mutate {
      gsub         => [ "thread", "[\[\]]", ""]
    }
    date {
      match        => [ 
        "timestamp", 
        "MMM dd, yyyy hh:mm:ss a", 
        "MMM dd, yyyy h:mm:ss a",
        "MMM  d, yyyy hh:mm:ss a", 
        "MMM  d, yyyy h:mm:ss a" 
      ]
    }
  }

I tested it quite thoroughly with Grok Debugger app first and then even wrote a unit test (with more than 200 different log samples) for that using rspec from logstash and ran that unit test directly on the system where logstash is running. All passed.

But when logstash runs in production grok filter fails for all incoming. I take some of the failing lines and put them into unit test and unit test is passing with them. I'm stuck.

Here is an example log line:

####<Apr 19, 2016 10:57:42 AM CEST> <Info> <Security> <ncby-modd-wbl02> <> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <> <1461056262679> <BEA-000000> <Disabling the CryptoJ JCE Provider self-integrity check for better startup performance. To enable this check, specify -Dweblogic.security.allowCryptoJDefaultJCEVerification=true.>

It's logstash 2.3.1.

Logstash is receiving data from filebeat 1.2.3. Could that be the reason?

Best regards,
Anndrey

magnusbaeck · June 14, 2016, 6:05pm

The grok filter works for me:

$ cat test.config
input { stdin { } }
output { stdout { codec => rubydebug } }
filter {
  grok {
    match => [
      "message",
      '####<(?<timestamp>\w{3}\s+\d{1,2}, \d{4} \d{1,2}:\d{2}:\d{2} \w{2}) \w+> <%{DATA:severity}> <%{DATA:subsystem}> <%{DATA:machine}> <%{DATA:server}> <%{DATA:thread}> <%{DATA:user}> <%{DATA:diagnostic}> <%{DATA:transaction}> <%{DATA:raw_time}> <%{DATA:message_id}> <%{GREEDYDATA:message_text}'
    ]
  }
}
$ cat data
####<Apr 19, 2016 10:57:42 AM CEST> <Info> <Security> <ncby-modd-wbl02> <> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <> <1461056262679> <BEA-000000> <Disabling the CryptoJ JCE Provider self-integrity check for better startup performance. To enable this check, specify -Dweblogic.security.allowCryptoJDefaultJCEVerification=true.>
$ /opt/logstash/bin/logstash -f test.config < data
Settings: Default pipeline workers: 2
Pipeline main started
{
         "message" => "####<Apr 19, 2016 10:57:42 AM CEST> <Info> <Security> <ncby-modd-wbl02> <> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <> <1461056262679> <BEA-000000> <Disabling the CryptoJ JCE Provider self-integrity check for better startup performance. To enable this check, specify -Dweblogic.security.allowCryptoJDefaultJCEVerification=true.>",
        "@version" => "1",
      "@timestamp" => "2016-06-14T18:04:23.251Z",
            "host" => "hallonet",
       "timestamp" => "Apr 19, 2016 10:57:42 AM",
        "severity" => "Info",
       "subsystem" => "Security",
         "machine" => "ncby-modd-wbl02",
          "thread" => "[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'",
        "raw_time" => "1461056262679",
      "message_id" => "BEA-000000",
    "message_text" => "Disabling the CryptoJ JCE Provider self-integrity check for better startup performance. To enable this check, specify -Dweblogic.security.allowCryptoJDefaultJCEVerification=true.>"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

You may find https://github.com/magnusbaeck/logstash-filter-verifier useful when testing filters (caveat: I'm the author).

aadamovich · June 14, 2016, 6:45pm

Hi Magnus,

Thanks a lot for your reply!

Filter is actually OK and it works fine when I use file input or stdin input, but the problem seems to be related to some weird combination of filebeat input and grok reg ex., because all messages (including the one above) fail in production with tags added to the message: beats_input_codec_plain_applied, _grokparsefailure. I have other log file formats that go the same path (filebeat -> logstash -> grok -> elasticsearch) and they work, but this particular one does not.

Best regards,
Andrey

Topic		Replies	Views
Filter is not working Logstash	6	1241	July 6, 2017
Grok filter is not working Logstash	4	12341	July 6, 2017
grokParseFailure but works on grokdebug tester Logstash	3	761	December 15, 2017
_grokparsefailure_syslog but grok filter perfectly from stdin Logstash	3	1074	July 6, 2017
_grokparse failure, but it's working Logstash	2	798	July 6, 2017

Grok filter succeeds in unit test, in grok debugger, but fails in production

Related topics