Failed parsing date in format "dd/MM/yyyy" for specific date


(Roberto Lapolli) #1

Hello everyone.
I'm having trouble analyzing date for the "dd / mm / yyyy" format.
I'm using the "Date Filter". Everything works perfectly for any date, except for "10/16/2016". Yes, it sounds quite weird, but I have tested several approaches (including "%{DATE_EU: DT_2}", without success.
I've been dealing with this all day. I was unable to locate information on any site or forum. Even here I found nothing.

This is my conf:

input {
	file {
		type => "datas"
		path => "D:/Data/datas/datas*.*"
		sincedb_path => "D:/Data/since/datas.since"
            file_completed_action => "log_and_delete"
            file_completed_log_path => "D:/Data/log/datas.txt"
            mode => "read"
	}
}
filter {
    grok {
        match => { 
            "message" => "^(?<DT_1>.{10})"
        }
    }
    date {
        match => ["DT_1","dd/MM/yyyy"]
        target => [DT_2]
    }
    mutate {
        remove_field => [ "message","path","host","type","@version","@timestamp" ]
    }
}
output {
	stdout { 
		codec => rubydebug 
	}
	elasticsearch { 
		hosts => ["localhost:9200"]
		index => "datas_teste"
	}
}

And this is my datas.txt file:

16/10/2013
16/10/2014
16/10/2015
16/10/2016
16/10/2017
16/10/2018
16/10/2019
16/10/2020

output:

{
        "_index": "datas_teste",
        "_type": "doc",
        "_id": "tdelamYBCFcA8M2beokk",
        "_score": 1,
        "_source": {
          "DT_2": "2013-10-16T03:00:00.000Z",
          "DT_1": "16/10/2013"
        }
      },
      {
        "_index": "datas_teste",
        "_type": "doc",
        "_id": "tNelamYBCFcA8M2beokk",
        "_score": 1,
        "_source": {
          "DT_1": "16/10/2016",
          "tags": [
            "_dateparsefailure"
          ]
        }
      },...

How to deal with this without need to use ruby to slash field?


(Christian Dahlqvist) #2

The following example works for me without any _dateparsefailure:

input {
  generator {
    lines => ['{"DT_1":"16/10/2013"}','{"DT_1":"16/10/2014"}','{"DT_1":"16/10/2015"}','{"DT_1":"16/10/2016"}','{"DT_1":"16/10/2017"}','{"DT_1":"16/10/2018"}','{"DT_1":"16/10/2019"}']
    count => 1
    codec => json
  }
}

filter {
  date {
    match => [ "DT_1", "dd/MM/yyyy" ]
    target => "DT_2"
  }
}

output {
  stdout { codec => rubydebug }
}

(Roberto Lapolli) #3

Hi Christian
Is not working. I copied and pasted your code without any modification, and the error persists:

conf:

input {
  generator {
    lines => ['{"DT_1":"16/10/2013"}','{"DT_1":"16/10/2014"}','{"DT_1":"16/10/2015"}','{"DT_1":"16/10/2016"}','{"DT_1":"16/10/2017"}','{"DT_1":"16/10/2018"}','{"DT_1":"16/10/2019"}']
    count => 1
    codec => json
  }
}

filter {
  date {
    match => [ "DT_1", "dd/MM/yyyy" ]
    target => "DT_2"
  }
}

output {
  stdout { codec => rubydebug }
}

the result:
...
{
"sequence" => 0,
"@timestamp" => 2018-10-13T21:47:54.762Z,
"DT_1" => "16/10/2015",
"@version" => "1",
"DT_2" => 2015-10-16T03:00:00.000Z,
"host" => "DESKTOP-BOJTLCA"
}
{
"sequence" => 0,
"@timestamp" => 2018-10-13T21:47:54.762Z,
"DT_1" => "16/10/2016",
"@version" => "1",
"host" => "DESKTOP-BOJTLCA",
"tags" => [
[0] "_dateparsefailure"
]
}
{
"sequence" => 0,
"@timestamp" => 2018-10-13T21:47:54.762Z,
"DT_1" => "16/10/2017",
"@version" => "1",
"DT_2" => 2017-10-16T02:00:00.000Z,
"host" => "DESKTOP-BOJTLCA"
}
...

How to debug this?


(Christian Dahlqvist) #4

What version of Logstash are you using? What operating system? What Java version?


(Roberto Lapolli) #5

Server:

  • OS: Windows Server 2012 R2 (version 6.3 - build 9600 - en);
  • Java: Java Version "1.8.0_181"; Java SE Runtime Environment (build 1.8.0_181-b13)
  • Logstash: 6.4.1

My machine:

  • OS: Windows 10 (version 1803, build 17134.345 - pt)
  • Java: java version "1.8.0_171"; Java SE Runtime Environment (build 1.8.0_171-b11)
  • Logstash: 6.4.2

When I change date format from 16/10/2016 (dd/MM/yyyy) to 2016-10-16T03:00:00.000Z, everything works fine:

input {
  generator {
    lines => ['{"DT_1":"2013-10-16T03:00:00.000Z"}','{"DT_1":"2014-10-16T03:00:00.000Z"}','{"DT_1":"2015-10-16T03:00:00.000Z"}','{"DT_1":"2016-10-6T03:00:00.000Z"}','{"DT_1":"2017-10-16T03:00:00.000Z"}',	'{"DT_1":"2018-10-6T03:00:00.000Z"}','{"DT_1":"2019-10-16T03:00:00.000Z"}']
    count => 1
    codec => json
  }
}

filter {
  date {
    match => [ "DT_1", "ISO8601" ]
    target => "DT_2"
  }
}

output {
  stdout { codec => rubydebug }
  elasticsearch { 
		hosts => ["localhost:9200"]
		index => "datas_teste"
  }
}

result:

{
"_index": "datas_teste",
"_type": "doc",
"_id": "rx4zcGYBgpRxIy0XY9wj",
"_score": 1,
"_source": {
"@version": "1",
"@timestamp": "2018-10-14T01:31:46.657Z",
"DT_2": "2016-10-16T03:00:00.000Z",
"DT_1": "2016-10-16T03:00:00.000Z",
"host": "DESKTOP-BOJTLCA",
"sequence": 0
}
},...

We are thinking of changing the format to iso8086 when we build our csv file, but this will require a lot of effort. We need an index of 12 years of data and every year we have 1.4 billion documents. Changing each date (a single document has 6 date fields) will cost us a lot of time.


(Christian Dahlqvist) #6

That is very strange. I was wondering if you might have some strange character in your file that caused this, but if you got the same failure while copying the example I provided that can not be the case. I have no issues running that example at all on my laptop.


(Roberto Lapolli) #7

Yep ... very strange. I've tried your sample on both my computer and server with the same strange result. Why this specific date? I have not tried every possible date, but in our example file, which has 6.2 million records, only that date caused failure.

Where can I get help? How to debug this date filter?


(Flávio Knob) #9

I'm having a similar issue here. Only instead of "16/10/yyyy" (16. october of any given year) is happens with the 15. october:

Pipeline:

input {
	stdin {}
}
filter {
	date {
		match => ["message", "dd/MM/yyyy"]
		target => "data"
	}
}
output {
	stdout { codec => rubydebug }
}

Results:

16/10/2017
{
    "@timestamp" => 2018-10-19T12:29:08.060Z,
          "host" => "0.0.0.0",
      "@version" => "1",
          "data" => 2017-10-16T02:00:00.000Z,
       "message" => "16/10/2017"
}
15/10/2017
{
    "@timestamp" => 2018-10-19T12:29:18.999Z,
          "host" => "0.0.0.0",
          "tags" => [
        [0] "_dateparsefailure"
    ],
      "@version" => "1",
       "message" => "15/10/2017"
}

#10

I guess I know what's going on: In your countries daylight saving time starts and ends at midnight.

16/10/2016 is 16/10/2016 00:00:00 and therefore doesn't exist:
https://www.timeanddate.com/time/change/brazil/sao-paulo?year=2016
Sonntag, 16. Oktober 2016, 00:00:00 clocks were turned forward 1 hour to
Sonntag, 16. Oktober 2016, 01:00:00 local daylight time instead.

I tested the first case with timezone => "America/Sao_Paulo"

16/10/2016
{
      "@version" => "1",
          "tags" => [
        [0] "_dateparsefailure"
    ],
          "host" => "elasticsearch-vm",
       "message" => "16/10/2016",
    "@timestamp" => 2018-10-19T12:57:30.877Z
}
17/10/2016
{
      "@version" => "1",
          "data" => 2016-10-17T02:00:00.000Z,
          "host" => "elasticsearch-vm",
       "message" => "17/10/2016",
    "@timestamp" => 2018-10-19T12:57:42.885Z
}

I wasn't able to reproduce the second example, but it would make sense:
https://www.timeanddate.com/time/change/brazil?year=2017


(Flávio Knob) #11

Nice catch! I changed my example as seen below and it worked just fine... I wonder if that would be the "cleanest" way to solve this issue (shouldn't the "timezone" attribute help here? I had no luck with it...).
Anyway... Vielen Dank und "Obrigado" aus Brasilien!

input {
	stdin {}
}
filter {
	mutate {
		update => { "message" => "%{message} 01:00" }
	}

	date {
		match => ["message", "dd/MM/yyyy HH:mm"]
		target => "data"
	}
}
output {
	stdout { codec => rubydebug }
}

(Roberto Lapolli) #12

I had not thought about time zone, indeed.
Thank you Jenni.


(Roberto Lapolli) #13

Hi Flávio.
The best way I could find to solve this issue with daylight is setting timezone to "Etc/GMT", like this:

date {
        match => [ "DT_1", "dd/MM/yyyy" ]
        target => "DT_2"
        timezone => "Etc/GMT"
    }

Thus, the date "16/10/2016" becomes "2016-10-16T00:00:00.000Z". This solved my problem, since I do not need the datetime, just the value as the date to perform some calculations.

I hope this can be useful for someone else, besides me.


(Flávio Knob) #14

It'll certanly be usefull to me! I1ll try it out later... It's certanly a more elegant solution.
Thank you!


(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.