_grokparsefailiure only after date filter has been included

I have a filter that looks something like this:

filter {
    grok {
        match => {"message" => '%{IPORHOST:remote_addr}%{SPACE}-%{SPACE}%{HOSTPORT:upstream_addr}%{SPACE}-%{SPACE}%{USERNAME:remote_user}%{SPACE}\[%{HTTPDATE:local_time}\]%{SPACE}%{HOSTNAME:host}%{SPACE}\"%{WORD:verb}%{SPACE}%{URIPATHPARAM:path}%{SPACE}%{EMAILLOCALPART:protocol}\"%{SPACE}%{INT:response_code}%{SPACE}%{INT:bytes_sent}%{SPACE}"%{DATA:referer}"%{SPACE}%{QUOTEDSTRING:user_agent}'}
    }
    date {
        match => [ "local_time", "dd/MMM/yyyy:HH:mm:ss Z"  ]
        target => "@timestamp"
    }
}

The log format is from an nginx log, so local_time is the nginx formatted $local_time variable (e.g. [27/Mar/2017:12:59:59 +0000]).

The filter works wonderfully when I either remove date completely or put a non-existent variable in place of local_time. Using date as defined above, every record fails with _grokparsefailure. Running logstash as very verbose/debug don't seem to be pointing me in a useful direction. Any tips on where to go next?

I don't believe the date filter is causing the grok failure. Can you share an example line of input?

Yeah, I'm almost positive I must be missing something simple.

Two example log entries:

66.249.64.188 - 10.0.1.100:80 - - [27/Mar/2017:06:49:40 +0000]  example.com "GET /term/slab/?q=query HTTP/1.1" 200 6120 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.12 - 10.0.1.105:80 - - [27/Mar/2017:06:49:40 +0000]  example.com "GET /static/ HTTP/1.1" 200 11217 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Entry in ElasticSearch when I use the date filter as shown above:

{
  "_index": "logstash-lb-2017.03.28",
  "_type": "logs",
  "_id": "AVsUvMvrN7tpZX5Pb5Nn",
  "_score": 1,
  "_source": {
    "message": "66.249.64.12 - 10.0.1.105:80 - - [27/Mar/2017:06:49:40 +0000]  example.com \"GET /static/ HTTP/1.1\" 200 11217 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
    "@version": "1",
    "@timestamp": "2017-03-28T11:46:08.231Z",
    "path": "/tmp/sample.log",
    "host": "ip-10-0-1-138",
    "tags": [
      "_grokparsefailure"
    ]
  },
  "fields": {
    "@timestamp": [
      1490701568231
    ]
  }
}

Entry in ElasticSearch when I either remove the date filter, or make it point to a non-existent field:

{
  "_index": "logstash-lb-2017.03.28",
  "_type": "logs",
  "_id": "AVsUvkImN7tpZX5Pb5PF",
  "_score": 1,
  "_source": {
    "message": "66.249.64.12 - 10.0.1.105:80 - - [27/Mar/2017:06:49:40 +0000]  example.com \"GET /static/ HTTP/1.1\" 200 11217 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\"\"",
    "@version": "1",
    "@timestamp": "2017-03-28T11:47:44.116Z",
    "path": [
      "/tmp/sample.log",
      "/static/"
    ],
    "host": [
      "ip-10-0-1-138",
      "example.com"
    ],
    "remote_addr": "66.249.64.12",
    "upstream_addr": "10.0.1.105:80",
    "remote_user": "-",
    "local_time": "27/Mar/2017:06:49:40 +0000",
    "verb": "GET",
    "protocol": "HTTP/1.1",
    "response_code": "200",
    "bytes_sent": "11217",
    "referer": "-",
    "user_agent": "\"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\"",
    "geoip": {
      "ip": "66.249.64.12",
      "country_code2": "US",
      "country_code3": "USA",
      "country_name": "United States",
      "continent_code": "NA",
      "region_name": "CA",
      "city_name": "Mountain View",
      "latitude": 37.385999999999996,
      "longitude": -122.0838,
      "dma_code": 807,
      "area_code": 650,
      "timezone": "America/Los_Angeles",
      "real_region_name": "California",
      "location": [
        -122.0838,
        37.385999999999996
      ]
    }
  },
  "fields": {
    "@timestamp": [
      1490701664116
    ]
  }
}

Works fine here (Logstash 2.4):

$ cat test.config 
input { stdin {} }
output { stdout { codec => rubydebug } }
filter {
    grok {
        match => {"message" =>
'%{IPORHOST:remote_addr}%{SPACE}-%{SPACE}%{HOSTPORT:upstream_addr}%{SPACE}-%{SPACE}%{USERNAME:remote_user}%{SPACE}\[%{HTTPDATE:local_time}\]%{SPACE}%{HOSTNAME:host}%{SPACE}\"%{WORD:verb}%{SPACE}%{URIPATHPARAM:path}%{SPACE}%{EMAILLOCALPART:protocol}\"%{SPACE}%{INT:response_code}%{SPACE}%{INT:bytes_sent}%{SPACE}"%{DATA:referer}"%{SPACE}%{QUOTEDSTRING:user_agent}'}
    }
    date {
        match => [ "local_time", "dd/MMM/yyyy:HH:mm:ss Z"  ]
        target => "@timestamp"
    }
}
$ echo '66.249.64.188 - 10.0.1.100:80 - - [27/Mar/2017:06:49:40 +0000]  example.com "GET /term/slab/?q=query HTTP/1.1" 200 6120 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"' | /opt/logstash/bin/logstash -f test.config
Settings: Default pipeline workers: 8
Pipeline main started
{
          "message" => "66.249.64.188 - 10.0.1.100:80 - - [27/Mar/2017:06:49:40 +0000]  example.com \"GET /term/slab/?q=query HTTP/1.1\" 200 6120 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\"",
         "@version" => "1",
       "@timestamp" => "2017-03-27T06:49:40.000Z",
             "host" => [
        [0] "lnxolofon",
        [1] "example.com"
    ],
      "remote_addr" => "66.249.64.188",
    "upstream_addr" => "10.0.1.100:80",
      "remote_user" => "-",
       "local_time" => "27/Mar/2017:06:49:40 +0000",
             "verb" => "GET",
             "path" => "/term/slab/?q=query",
         "protocol" => "HTTP/1.1",
    "response_code" => "200",
       "bytes_sent" => "6120",
          "referer" => "-",
       "user_agent" => "\"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\""
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.