I need to report back to Logstash community about a bug that i found.
As you can see below, the @timestamp field is populated with date and timestamp in the UTC where as first_switched and last_switched are in local time zone (This netflow is generated and in the US Eastern timezone). You will notice that there is a 4 hour difference. 18:39 vs. 14:39.
All the date fields are expected to follow the ISO 8601, which should always be represented in UTC.
Are you using the netflow codec? As I read its code (at the bottom), Logstash doesn't attempt any kind of timezone adjustments. It assumes all timestamps arrive as UTC over the wire and emits UTC accordingly. I'd make sure that what Logstash gets actually is UTC and not local time.
All the date fields are expected to follow the ISO 8601, which should always be represented in UTC.
what should I do then with this issue? can you help me fix this and make logstash read the time stamp as EDT not UTC?
Before going into solution mode we need to have all facts. I'd use Wireshark or similar to capture some of the netflow traffic and inspect what's actually being sent over the wire. If the netflow data indeed uses EDT I'd try to reconfigure whatever device is emitting the data so that it's compliant with the specification. Failing that, I'd see what can be done on the Logstash side. You should be able to use a date filter.
Yes, we are using the netflow codec and we checked the data on the wire using wireshark to ensure that the timestamps and other fields are correct..
As you will notice in the following packet structure, there are 2 fields in the Flow Header of the netflow and there 2 others in each data record.
We have confirmed that unix_secs are in the UTC (time epoch), the sys_uptime value is correct and the last_switched and first_switched fields are correct relative to sys_uptime field.
Another question, we are wondering if @timestamp value is derived from the unix_secs in the header or from somewhere?
Export packet
Packet Header
Template FlowSet
Data FlowSet
Data FlowSet
...
Template FlowSet
Data FlowSet
...
Flow header format
Bytes Contents Description
0-1 version NetFlow export format version number
2-3 count Number of flow sets exported in this packet, both template and data (1-30).
4-7 sys_uptime Current time in milliseconds since the export device booted.
8-11 unix_secs Current count of seconds since 0000 UTC 1970.
12-15 package_sequence Sequence counter of all export packets sent by the export device. Note: This is a change from the Version 5 and Version 8 headers, where this number represented “total flows.”
16-19 source_id A 32-bit value that is used to guarantee uniqueness for all flows exported from a particular device. (The Source ID field is the equivalent of the engine type and engine ID fields found in the NetFlow Version 5 and Version 8 headers). The format of this field is vendor specific. In Cisco's implementation, the first two bytes are reserved for future expansion, and will always be zero. Byte 3 provides uniqueness with respect to the routing engine on the exporting device. Byte 4 provides uniqueness with respect to the particular line card or Versatile Interface Processor on the exporting device. Collector devices should use the combination of the source IP address plus the Source ID field to associate an incoming NetFlow export packet with a unique instance of NetFlow on a particular device.
Template FlowSet Format
LAST_SWITCHED 21 4 System uptime at which the last packet of this flow was switched
FIRST_SWITCHED 22 4 System uptime at which the first packet of this flow was switched
Yes, we are using the netflow codec and we checked the data on the wire using wireshark to ensure that the timestamps and other fields are correct..
As you will notice in the following packet structure, there are 2 fields in the Flow Header of the netflow and there 2 others in each data record.
We have confirmed that unix_secs are in the UTC (time epoch), the sys_uptime value is correct and the last_switched and first_switched fields are correct relative to sys_uptime field.
Another question, we are wondering if @timestamp value is derived from the unix_secs in the header or from somewhere?
Export packet
Packet Header
Template FlowSet
Data FlowSet
Data FlowSet
...
Template FlowSet
Data FlowSet
...
Flow header format
Bytes
Contents
Description
0-1
version
NetFlow export format version number
2-3
count
Number of flow sets exported in this packet, both template and data (1-30).
4-7
sys_uptime
Current time in milliseconds since the export device booted.
8-11
unix_secs
Current count of seconds since 0000 UTC 1970.
12-15
package_sequence
Sequence counter of all export packets sent by the export device. Note: This is a change from the Version 5 and Version 8 headers, where this number represented “total flows.”
16-19
source_id
A 32-bit value that is used to guarantee uniqueness for all flows exported from a particular device. (The Source ID field is the equivalent of the engine type and engine ID fields found in the NetFlow Version 5 and Version 8 headers). The format of this field is vendor specific. In Cisco's implementation, the first two bytes are reserved for future expansion, and will always be zero. Byte 3 provides uniqueness with respect to the routing engine on the exporting device. Byte 4 provides uniqueness with respect to the particular line card or Versatile Interface Processor on the exporting device. Collector devices should use the combination of the source IP address plus the Source ID field to associate an incoming NetFlow export packet with a unique instance of NetFlow on a particular device.
Template FlowSet Format
LAST_SWITCHED
21
4
System uptime at which the last packet of this flow was switched
FIRST_SWITCHED
22
4
System uptime at which the first packet of this flow was switched
We have confirmed that unix_secs are in the UTC (time epoch), the sys_uptime value is correct and the last_switched and first_switched fields are correct relative to sys_uptime field.
Well, then I'm out of ideas. AFAICT Logstash assumes UTC and does not make any timezone adjustments.
Another question, we are wondering if @timestamp value is derived from the unix_secs in the header or from somewhere?
The netflow codec itself doesn't touch the @timestamp field so unless you use the date filter or by other means manipulate that field it'll contain the time when Logstash first saw the event.
It looks like calculation of first_switched and last_switched is not correct.
I am not seeing if overflow of sysup time , first and last is taken care.
It is not always true that sysup_time in header will greater than first and last filed in netflow record.
That is a bug in netflow codec.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.