Found a bug in date parsing!


(Peter Gervais ) #1

Update:
As stated below, I have a file with 3,951,165 csv llines that process properly with the config file shown below. In this, I get 153 lines that fail with dateparsefailure. After looking at the ruby debug output, the lines that fail all have the date and time of :
2017-03-12,2 where 2 is the hour.
Taking one line and experimenting with it, if I change the date to anything other 2017-03-12 or change the hour to anything other than 2 it works!.

The only conclusion I can make is the date parse code has an error in it.!! Test line shown below:

2017-03-12,2,APPL,PAY,AXK1,AX,2.0035000000000001E-3,0,3.7102000000000001E-4,0,2,4,1.2999999999999999E-3,SYSQ,DCSLPLEX,0.0

I have a really weird problem. I get 153 _dateparsefailure errors on 3951165 csv lines. The date field fails on line:

"message" => "2017-03-12,2,CICS,QM1,ZZZZ,ZZ,2.8401899999999998E-3,0,5.2596000000000001E-4,0,9,9,1.6000000000000001E-3,DZBP,CBSADPLX,",

i.e. 2017-03-12,2 where we have date and hour. i.e (,2).
All other lines parse properly.
Here is the logstash filter portion that handles Date.


        mutate {
                add_field => {  "timestamp" => "%{ObsDate}:%{ObsHour}" }
        }
        mutate {
# Extract the '-' symbol from the date string. We do this because we cannot convert a string in this format.
# therefore what we do is remove the '-' and replace with an empty space.
        gsub => [ "timestamp",  "\-", " " ]
    }

#       "2017-03-12,12  => 2017 03 12:12
#
        date {
                match => [ "timestamp", "yyyy MM dd:HH" , "dd MMM yy:HH"]
                target => "@timestamp"
        }

from the ruby output:

           "timestamp" => "2017 03 12:2"
           "ObsHour" => 2,

Can anyone help me with this? I have spent the better part of two days on this.


(Guy Boertje) #2

I think you need a date filter pattern with one H.

input {
  generator {
    lines => ["2017-03-12,2|stop|go", "2017-03-12,22|stop|go"]
    count => 1
  }
}

filter {
  csv {
    separator => "|"
    columns => ["timestamp1","red","green"]
  }
  date {
    # put HH first, it matches first more often
    match => [ "timestamp1", "yyyy-MM-dd,HH" , "yyyy-MM-dd,H"]
    target => "@timestamp"
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Results:

{
         "green" => "go",
       "message" => "2017-03-12,2|stop|go",
      "@version" => "1",
           "red" => "stop",
    "@timestamp" => 2017-03-12T02:00:00.000Z,
          "host" => "Elastics-MacBook-Pro.local",
      "sequence" => 0,
    "timestamp1" => "2017-03-12,2"
}
{
         "green" => "go",
       "message" => "2017-03-12,22|stop|go",
      "@version" => "1",
           "red" => "stop",
    "@timestamp" => 2017-03-12T22:00:00.000Z,
          "host" => "Elastics-MacBook-Pro.local",
      "sequence" => 0,
    "timestamp1" => "2017-03-12,22"
}

(Peter Gervais ) #3

Therefore, why does it work with 2017-03-12,4 or 3, or 20 etc ... ??
Only the pattern of 2017-03-12,2 fails. All other pass properly.


(Guy Boertje) #4

Well you did not say what version of LS (and the date filter) you are using - and I didn't ask either.

From my test on LS 6.1.1 HH works with 2017-03-12,2...

input {
  generator {
    lines => [
      "2017-03-12,2|stop|go",
      "2017-03-18,2|stop|go",
      "2017-03-20,0|stop|go",
      "2017-03-25,9|stop|go"
    ]
    count => 1
  }
}

filter {
  csv {
    separator => "|"
    columns => ["timestamp1","red","green"]
  }
  date {
    match => [ "timestamp1", "yyyy-MM-dd,HH"]
    target => "@timestamp"
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Results

{
      "sequence" => 0,
      "@version" => "1",
         "green" => "go",
    "timestamp1" => "2017-03-12,2",
          "host" => "Elastics-MacBook-Pro.local",
       "message" => "2017-03-12,2|stop|go",
    "@timestamp" => 2017-03-12T02:00:00.000Z,
           "red" => "stop"
}
{
      "sequence" => 0,
      "@version" => "1",
         "green" => "go",
    "timestamp1" => "2017-03-18,2",
          "host" => "Elastics-MacBook-Pro.local",
       "message" => "2017-03-18,2|stop|go",
    "@timestamp" => 2017-03-18T02:00:00.000Z,
           "red" => "stop"
}
{
      "sequence" => 0,
      "@version" => "1",
         "green" => "go",
    "timestamp1" => "2017-03-20,0",
          "host" => "Elastics-MacBook-Pro.local",
       "message" => "2017-03-20,0|stop|go",
    "@timestamp" => 2017-03-20T00:00:00.000Z,
           "red" => "stop"
}
{
      "sequence" => 0,
      "@version" => "1",
         "green" => "go",
    "timestamp1" => "2017-03-25,9",
          "host" => "Elastics-MacBook-Pro.local",
       "message" => "2017-03-25,9|stop|go",
    "@timestamp" => 2017-03-25T09:00:00.000Z,
           "red" => "stop"
}

(Guy Boertje) #5

I get the same results as above with LS 5.2.2


(Guy Boertje) #6

Could it be that, in those 153 lines there is a different unicode character in the date sequence?

I would use a hex editor to look at the original data in the CSV file.


(Peter Gervais ) #7

I have locked at the data with od -c (octal dump ) .Nothing hidden.
Can you try to do the test with the data line submitted? ie the actual stuff I need to parse and the config file provide?
PS: I downloaded logstash 6.1.1. and get the same error?
Are there cares we get dateparseerror for other items on the line ?


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.