Should I remove my original time stamp field in favor of @timestamp?

About this question

This question:

  • Spans two forum categories—Elasticsearch and Logstash—but I could select only one.
  • Is a copy, with some rewording, of two comments I recently added to the closed elastic/logstash GitHub issue #5676. It belatedly occurred to me that I might get more feedback on this forum. Apologies for the crossposting.
  • Is similar to questions already asked in this forum, but not, I think, identical (again, with apologies if I’m wrong about this; I don’t want to waste anyone’s time).

What I’m doing now

I am a member of the development team for a product that extracts data from proprietary binary-format logs, and then forwards that data to Logstash; for example, as JSON Lines over TCP.

Each log event that this product extracts—each line of JSON Lines that it forwards to Logstash—contains a field named time that is the event time stamp. If the original binary-format log contains multiple candidate fields for an event time stamp, the product chooses one to use as the value of the time field.

Currently, I use a Logstash config with a date filter to match the value of the Logstash-generated @timestamp field to the time field.

But I end up with events (documents) in Elasticsearch that have both time and @timestamp fields, with effectively* identical values. I don’t like that duplication.

Discussion of possible options

To avoid the duplication, I could use remove_field to remove the time field, but this starts to grate. My input events already contain a time stamp field named time. I’m happy with that field name. I don’t want to have to specify a date filter to “map” that field to the Logstash-specific @timestamp field. I don’t want to have to remove “my” time field to avoid duplication.

I could omit the date filter and let Logstash set @timestamp to the default value: the time that Logstash first sees the event. I can imagine that this might be useful to assist with debugging, in the case of problems with forwarding. Given the choice, though, I think I’d prefer to save the bytes and simply omit @timestamp, and have a “lean” Logstash config with only input and output sections; no filter section.

* The value of the @timestamp field generated by the date filter does not exactly match the original time field value. The time field value:

  • Typically ends with a zone designator in the format +hh:mm or -hh:mm
  • Contains fractions of a second to 6 decimal places (microsecond precision)

whereas @timestamp is in UTC—always has a Z zone designator—and contains fractions of a second to only 3 decimal places. (I understand that Elasticsearch currently represents date fields as Epoch time values with millisecond precision.)

For various reasons (that I’m happy to discuss), we (the product development team) would prefer to preserve, in the ingested Elasticsearch document, the “difference-component” zone designator and precision of the original time field, even if these are only preserved in the original string field value of the ingested source.

A GitHub user commented:

the precedent has been set for Logstash ... to use @timestamp as the canonical field

That’s true, and that’s one reason why I’m grappling with this question. Because, in the context of Logstash, it leads me to set @timestamp to the value of “my” time field, and then remove time:

filter {
  date {
    match => [ 'time', '... ' ]
    remove_field => 'time'
}

Whereas, ideally, I’d prefer the time field from my product to pass through with its original name and value to the analytics platform—Elasticsearch is just one such platform—without being “forced” into using a different field name. In practice, though, that might not be possible, because there is no “cross-platform canon” in this regard.

Other platforms aside, even within the Elastic Stack, if I bypass Logstash and use the Elasticsearch bulk API, I don’t need to introduce @timestamp. That is, unless I want documents ingested via the bulk API to match the structure of documents ingested via Logstash.

Summary of options

  1. Omit the date filter and let the value of @timestamp default to the time that Logstash first sees an event.
  • Pros:
    • More concise Logstash config.
    • Perhaps (I’ve not done any benchmark testing to check this): less Logstash processing (parsing a supplied input date value versus inserting a default value).
    • Perhaps: a potentially useful @timestamp value for debugging.
  • Cons:
    • I’m not convinced of the usefulness of this @timestamp value. Is it really worth storing in Elasticsearch?
    • This @timestamp value has nothing to do with the event. Users will have to understand the event data: they will have to know that time is the “true” event time stamp.
  1. Specify the date filter without remove_field.
  • Pros:
    • @timestamp matches the event time stamp, thereby matching the expectations of users who are familiar with the Logstash “canon”.
  • Cons:
    • More verbose Logstash config.
    • Perhaps more Logstash processing.
    • Data duplication: @timestamp matches time.
    • Forced to use a Logstash-specific field name, when this field name is not required by other analytics platforms, and the field name is not even required using other ingestion methods available within the Elastic Stack (such as the Elasticsearch bulk API).
  1. Specify the date filter with remove_field.
    Pros and cons same as previous item, minus the con for data duplication.

Thoughts and suggestions welcome.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.