Conflicting logstash timestaps system vs UTC

Hi,

I am posting this from Chile and have patched java with the latest timezone data from iana.org:

I see conflicting timestamp data in logstash-2.3.2-1(receiving from rsyslog) which is causing kibana to display the logs with one hour behind logstash index in kibana is using the @timestamp. ie:

{
"message" => "<14>Jun 7 10:43:47 kamisama gnome-session: in resource: 0xE30E21",
"@version" => "1",
"@timestamp" => "2016-06-07T13:43:47.000Z",
"type" => "syslog",
"host" => "172.16.50.113",
"syslog_pri" => "14",
"syslog_timestamp" => "Jun 7 10:43:47",
"syslog_hostname" => "kamisama",
"syslog_program" => "gnome-session",
"syslog_message" => "in resource: 0xE30E21",
"received_at" => "2016-06-07T14:43:47.650Z",
"received_from" => "xxx.xxx.xxx.xxx",
"syslog_severity_code" => 6,
"syslog_facility_code" => 1,
"syslog_facility" => "user-level",
"syslog_severity" => "informational"
}

As you can see @timestamp is different than received_at and both suppose to be UTC. Local time is 10:43:47 (GMT-4) and UTC Time is 14:43:47 like received_at. Why "@timestamp" => "2016-06-07T13:43:47.000Z" is one hour behind?

My logstash configuration is just like the example from https://www.elastic.co/guide/en/logstash/current/config-examples.html

filter {
if [type] == "syslog" {
grok {
match => [ "message", "<%{NONNEGINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" ]
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}

What TZ is set for LS?

TZ is set to Chile/Continental (CLT) = GMT-4

http://www.timeanddate.com/time/zone/chile/santiago

message:<78>Jun 13 10:40:01 hostXX crond[7284]: (root) CMD (/usr/lib/sa/sa1 1 1) @version:1 @timestamp:June 13th 2016, 09:40:01.000 type:syslog host:172.16.xxx.xxx syslog_pri:78 syslog_timestamp:Jun 13 10:40:01 syslog_hostname:hostXX syslog_program:crond syslog_pid:7284 syslog_message:(root) CMD (/usr/lib/sa/sa1 1 1) received_at:June 13th 2016, 10:40:01.735 received_from:172.16.xxx.xxx syslog_severity_code:6 syslog_facility_code:9 syslog_facility:clock syslog_severity:informational _id:AVVKNH1jP0nKXsl-abr0 _type:syslog _index:logstash-2016.06.13

This is how kibana is displaying... @timestamp is one hour behind.. and I don't know from where is getting that incorrect information.

Thanks!

Is that the time that the LS host is using? Could there be a daylight savings thing happening?

The logstash host has the time properly configured.

# zdump -v -c 2015,2017 Chile/Continental
Chile/Continental -9223372036854775808 = NULL
Chile/Continental -9223372036854689408 = NULL
Chile/Continental Sun May 15 02:59:59 2016 UTC = Sat May 14 23:59:59 2016 CLST isdst=1 gmtoff=-10800
Chile/Continental Sun May 15 03:00:00 2016 UTC = Sat May 14 23:00:00 2016 CLT isdst=0 gmtoff=-14400
Chile/Continental Sun Aug 14 03:59:59 2016 UTC = Sat Aug 13 23:59:59 2016 CLT isdst=0 gmtoff=-14400
Chile/Continental Sun Aug 14 04:00:00 2016 UTC = Sun Aug 14 01:00:00 2016 CLST isdst=1 gmtoff=-10800
Chile/Continental 9223372036854689407 = NULL
Chile/Continental 9223372036854775807 = NULL
# date
Tue Jun 14 10:52:33 CLT 2016

The java timezone data:

# /usr/java/jdk1.8.0_92/jre/bin/java -jar /opt/elastic/tzupdater.jar -V
tzupdater version 2.0.3-b01
JRE tzdata version: tzdata2016d
tzupdater tool would update with tzdata version: tzdata2015b

I have changed the index to use the received_at field as a workaround but I would like to understand why the @timestamp is off by 1 hour.
Is there something inside the logstash code (a binary perhaps) that is not aware of the current timezones?
Should I fill a bug report?

Got some additional information.. I can say the problem happen when adding the "date" filter to the configuration:

filter {
if [type] == "syslog" {
grok {
match => [ "message", "<%{NONNEGINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" ]
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}

If I comment out the last three lines the problems goes away and @timestamp gets the correct time.

{
"message" => "<86>Jun 20 15:25:38 sbfisgdev-lv1 sshd[8484]: pam_unix(sshd:session): session closed for user root",
"@version" => "1",
"@timestamp" => "2016-06-20T19:25:38.817Z",
"type" => "syslog",
"host" => "xxx.xxx.xxx.xxx",
"syslog_pri" => "86",
"syslog_timestamp" => "Jun 20 15:25:38",
"syslog_hostname" => "XXXX-XX1",
"syslog_program" => "sshd",
"syslog_pid" => "8484",
"syslog_message" => "pam_unix(sshd:session): session closed for user root",
"received_at" => "2016-06-20T19:25:38.817Z",
"received_from" => "xxx.xxx.xxx.xxx",
"syslog_severity_code" => 6,
"syslog_facility_code" => 10,
"syslog_facility" => "security/authorization",
"syslog_severity" => "informational"
}

The three time fields are consistent.

So, what is the benefit of using the "date" filter? what is going on and why is that behaviour?