Help with scripted filed to compute difference between timestamps

Hello, I'm trying to create a Scripted Field to compute the delay between timestamps: the time when logstashs sees the event - the actual timestamp from the log/event. And so far I'm having a terrible time getting to understand and work with painless, I'm having a lot of trouble doing this simple thing...

Luckily I found some posts here from where I was able to scratch this script:

Date ts = new Date(doc['@timestamp'].value.millis);
Date ti = new Date(doc['event.ingested'].value.millis);

if (doc['event.timezone'].equals("+02:00")) {
    return (ti.getTime()+2*60*60*1000 - ts.getTime());
} else {
    return (ti.getTime() - ts.getTime());
}

Although all my events are supposed to be normalized to UTC by logstash, they are still processed as UTC+2, so the idea is to just add the +2h (event.timezone) when computing the difference between timestamps. However and here are my 2 issues:

  1. I'm unable to check whether event.timezone == "+02:00", and therefore it always goes to the else part of the conditional.
  2. Is there a way to manipulate the event.timezone dynamically? That way it wouldn't be necessary to go through point 1, just check if event.timezone exists and add it, whatever its value.

Thanks in advance, I would appreciate as much help as I can get to make Painless less Painful :sweat_smile:

doc['@timestamp'].value.millis should be millisecond since epoch ( 00:00:00 UTC on 1 January 1970)
So this is the value you should use to calculate

doc['event.ingested'].value.millis - doc['@timestamp'].value.millis

then you should know the difference in milliseconds and it should be easy to continue here

1 Like

Thanks, that works and simplifies things a little bit. But then my other issue, how can I check the value of event.timezone? Because this code doesn't seem to do the trick.

if (doc['event.timezone'].equals("+02:00")) {
}

why do you need the timezone when you have 2 UTC values and just need the difference?

1 Like

Because simply computing the difference is returning negative values, and when I check the timestamps, I can see that '@timestamp' (the actual timestamp obtained from the logs themselves and processed using the date filter plugin) is older than the 'event.ingested' by 2h.
Luckily some of the logs I parse have this 'event.timezone' parameter showing "+02:00", that's why I understand some of the events are shown as UTC+2, rather than UTC.

The ultimate goal here is to be able to compute the parsing delay of events, how much time has passed since the log is created and processed by logstash.

Internally all timestamp should be stored in UTC time, so if the calculation in your case is wrong, this is clearly indicating, that 1 component in your pipeline is submitting or calculating a wrong timestamp. could you share a sample document that contains wrong data? thx

{
  "_index": "apps-2021.08.20",
  "_type": "_doc",
  "_id": "9c2fb0795e315639f6fc035fd0d4d5fdd5a5e51b",
  "_version": 1,
  "_score": null,
  "fields": {
    "event.category": [
      "process"
    ],
    "related.hash": [
      "9c2fb0795e315639f6fc035fd0d4d5fdd5a5e51b"
    ],
    "type": [
      "json"
    ],
    "log.logger": [
      "Rebus.MessageRunningTimeExtensions"
    ],
    "event.hash": [
      "9c2fb0795e315639f6fc035fd0d4d5fdd5a5e51b"
    ],
    "fingerprint": [
      "9c2fb0795e315639f6fc035fd0d4d5fdd5a5e51b"
    ],
    "Rebus.MessageType": [
      "Herma.SmtpSender.Contracts.SendEmailRequest"
    ],
    "@version": [
      "1"
    ],
    "log.level": [
      "INFO"
    ],
    "event.kind": [
      "event"
    ],
    "event.severity": [
      2
    ],
    "Rebus.RunningTime": [
      0
    ],
    "event.original": [
      "{ \"@timestamp\": \"2021-08-20T09:39:13.146Z\", \"threadid\": \"50\", \"threadname\": \"Rebus 1 worker 35\", \"level\": \"INFO\", \"logger\": \"Rebus.MessageRunningTimeExtensions\", \"message\": \"Message processing took (00:00:00.4071827), for message of type 'SendEmailRequest'\", \"Program.Name\": \"Herma.SmtpSender\", \"Program.Version\": \"21214.11839.20.23\", \"Rebus.MessageType\": \"Herma.SmtpSender.Contracts.SendEmailRequest\", \"Rebus.RunningTime\": \"0.4\" }"
    ],
    "service.name": [
      "Herma.SmtpSender"
    ],
    "sink": [
      "dotnet"
    ],
    "message": [
      "Message processing took (00:00:00.4071827), for message of type 'SendEmailRequest'"
    ],
    "data_stream.type": [
      "logs"
    ],
    "tags": [
      "host.hostname:from:host.name",
      "ecs:fixup:source.ip",
      "live"
    ],
    "process.name": [
      "Herma.SmtpSender"
    ],
    "event.ingested": [
      "2021-08-20T07:39:13.166Z"
    ],
    "@timestamp": [
      "2021-08-20T09:39:13.146Z"
    ],
    "service.version": [
      "21214.11839.20.23"
    ],
    "ecs.version": [
      "1.6.0"
    ],
    "process.thread.id": [
      50
    ],
    "threadname": [
      "Rebus 1 worker 35"
    ]
  },
  "sort": [
    1629452353146
  ]
}

Here's a recent one! In this pipeline we parse a json message directly, so I don't use the "date" filter plugin to the @timestamp that comes with the json. The dev I'm working with assures me that timestamp is in UTC, but I seriously believe is in UTC+2. However, I'm pretty confident event.ingested is in UTC.

Here is another document from another pipeline where the timestamp is UTC+2 but it tells you. The log comes from /var/log/message and is sent to logstash using filebeat. In this pipeline the timestamp obtained from the log, goes through the "date" filter plugin.

{
  "_index": "systems-2021.08.20",
  "_type": "_doc",
  "_id": "c69e7e95de5a6940aff05c418fa3f5b67432eb96",
  "_version": 1,
  "_score": null,
  "fields": {
    "related.hash": [
      "c69e7e95de5a6940aff05c418fa3f5b67432eb96"
    ],
    "type": [
      "filebeat"
    ],
    "service.type": [
      "system"
    ],
    "agent.type": [
      "filebeat"
    ],
    "event.module": [
      "system"
    ],
    "fingerprint": [
      "c69e7e95de5a6940aff05c418fa3f5b67432eb96"
    ],
    "@version": [
      "1"
    ],
    "log.level": [
      "INFO"
    ],
    "agent.name": [
      "filebeat"
    ],
    "event.kind": [
      "event"
    ],
    "event.timezone": [
      "+02:00"
    ],
    "event.original": [
      "Aug 20 09:52:29 host.hostname 269d291203b0: 2021-08-20 07:52:29.5532 INFO [116] [        ] <00:00:> Herma.Renderer.Features.CryptoDevice: Getting certificate for calculated hash with value: some.hash.value for CryptoProfiles: \"native\", trying cache first"
    ],
    "fileset.name": [
      "syslog"
    ],
    "input.type": [
      "log"
    ],
    "log.offset": [
      137140339
    ],
    "message": [
      "Aug 20 09:52:29 host.hostname 269d291203b0: 2021-08-20 07:52:29.5532 INFO [116] [        ] <00:00:> Herma.Renderer.Features.CryptoDevice: Getting certificate for calculated hash with value: some.hash.value for CryptoProfiles: \"native\", trying cache first"
    ],
    "data_stream.type": [
      "logs"
    ],
    "tags": [
      "beats_input_codec_plain_applied",
      "dns:resolve:host.name",
      "ecs:fixup:source.ip",
      "ecs:fixup:host.hostname"
    ],
    "process.name": [
      "269d291203b0"
    ],
    "event.ingested": [
      "2021-08-20T07:52:29.677Z"
    ],
    "@timestamp": [
      "2021-08-20T09:52:29.000Z"
    ],
    "agent.id": [
      "d37a4f77-86bb-484c-914c-dd4d6877e2d1"
    ],
    "ecs.version": [
      "1.8.0",
      "1.6.0"
    ],
    "log.file.path": [
      "/var/log/messages"
    ],
    "agent.ephemeral_id": [
      "533e8942-9ae2-4b65-af0f-5a79172a63a6"
    ],
    "agent.version": [
      "7.12.0"
    ],
    "event.dataset": [
      "system.syslog"
    ]
  },
  "sort": [
    1629453149000
  ]
}

And finally this is how I process the timestamp using date:

    date {
        timezone => "UTC"
        match => [
            "timestamp",
            # HTTP logs
            "dd/MMM/yyyy:HH:mm:ss Z",
            "EEE MMM d HH:mm:ss.SSSSSS yyyy",
            "EEE MMM dd HH:mm:ss.SSSSSS yyyy",
            "MMM d HH:mm:ss",
            "MMM  d HH:mm:ss",
            "MMM dd HH:mm:ss",
            # 2021-08-05 07:50:35.7015
            "yyyy-MM-dd HH:mm:ss.SSSS",
            "ISO8601"
        ]
        remove_field => [ "timestamp" ]
    }

Sorry for the long comment, hopefully it will help find the problem.

Hi
All timestamps are stored in UTC, but one of them is not correct. the local timestamp was stored pretending to be UTC. So the first task is to find our: which of those UTC stamp is correct? then you need to debug the system that generates the wrong timestamp. Is the timezone set correctly?
Best,
Matthias

I followed your advice, and you were absolutely right: there are many of our machines using other timezones rather than UTC, however, they are stored pretending to be UTC. So now I'm not sure if I can do anything from logstash to fix that, I've been testing to change timezones in the date-filter-plugin, but I don't see any change in the timestamps. Any advice? How am I supposed to use the date-filter-plugin to normalize the timezones? I'm already in conversations to change all machines to use UTC.

Thanks in advance.

I think the best solution is to change all machines to use UTC. It might be doable to fix this with logstash, but I doubt that it's easy and all this fixes need to be removed once all timestamps are right.

Best,
Matthias