Wrong timestamp (NOT timezone issue) with System module

Hi team and thank you for your work.

I am very new to ES and I am trying to build a simple stack with 1 ES, 1 KB and 2 filebeats (launched with binaries) all v7.5. I am not using LogStash but the prepacked modules of Filebeat.
Things were going well before new year but as I come back to the office, I notice some new documents are indexed for the end of the year of 2020.

After investigation, I can see that only System events are concerned. In fact, any event in the rotated log files (for example I have /var/log/messages-20191229) that has not been indexed, is now indexed as in 2020.

As an example I have in this file:

Dec 22 04:34:59 my-system kernel: Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=xx:...:xx SRC=xxx.xxx.xxx.xxx DST=xxx.xxx.xxx.xxx LEN=1100 TOS=0x00 PREC=0x00 TTL=42 ID=54955 DF PROTO=UDP SPT=60569 DPT=443 LEN=1080

and in Kibana:

{
  "_index": "filebeat-7.5.1-2020.01.06-000001",
  "_type": "_doc",
  "_id": "ceeSgG8BDy804je_mJfq",
  "_version": 1,
  "_score": null,
  "_source": {
    "agent": {
      "hostname": "my-system.tld",
      "name": "my-system",
      "id": "d77c284b-dfd8-4d26-ab82-1fad8bc6ce6f",
      "type": "filebeat",
      "ephemeral_id": "0f8e7686-b861-4b60-8855-5b4ddcd6c5f6",
      "version": "7.5.1"
    },
    "process": {
      "name": "kernel"
    },
    "log": {
      "file": {
        "path": "/var/log/messages-20191229"
      },
      "offset": 1499167
    },
    "fileset": {
      "name": "syslog"
    },
    "message": "Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=xx:...:xx SRC=xxx.xxx.xxx.xxx DST=xxx.xxx.xxx.xxx LEN=1100 TOS=0x00 PREC=0x00 TTL=42 ID=54955 DF PROTO=UDP SPT=60569 DPT=443 LEN=1080 ",
    "input": {
      "type": "log"
    },
    "@timestamp": "2020-12-22T04:34:59.000-05:00",
    "system": {
      "syslog": {}
    },
    "ecs": {
      "version": "1.1.0"
    },
    "service": {
      "type": "system"
    },
    "host": {
      "hostname": "my-system",
      "os": {
        "kernel": "3.10.0-962.3.2.lve1.5.26.7.el7.x86_64",
        "name": "CloudLinux",
        "family": "",
        "version": "7.7 (Valery Bykovsky)",
        "platform": "cloudlinux"
      },
      "containerized": false,
      "name": "my-system",
      "id": "a195f221b91edcc1ff7319c05c12b888",
      "architecture": "x86_64"
    },
    "event": {
      "timezone": "-05:00",
      "module": "system",
      "dataset": "system.syslog"
    }
  },
  "fields": {
    "suricata.eve.timestamp": [
      "2020-12-22T09:34:59.000Z"
    ],
    "@timestamp": [
      "2020-12-22T09:34:59.000Z"
    ]
  },
  "sort": [
    1608629699000
  ]
}

I made some tests and reload the content of any rotated log file before 2020 by doing:

  1. delete documents with timestamp greater than today in ES
  2. delete any pipeline
  3. set file offset to 0 in data/registry/filebeat/data.json for appropriate files

...but still, all of these files' content (messages-201912* and secure-201912*) is indexed as end of year 2020.

In short, as long as I indexed their content in 2019 everything was going well, but now we are in 2020 I can't index them without having wrong timestamp.
Please forgive my poor english :slight_smile:

Any help appreciated.

Thomas

Thanks for reporting this! It looks like the problem is because the log files themselves don't include the year. There's no way for Filebeat to tell from Dec 22 04:34:59 which year the timestamp is meant to be in, so it defaults to the current year. This is clearly a poor choice in early January, so I filed a bug on the issue. In the meantime, I'm sorry to say the only fix I know for the existing logs is a custom processor or text filter to insert the correct value...

Hi @faec

thanks for answering. I wasn't sure this was a bug because I didn't find anyone complaining about such issue. Am I the only one using Filebeat System Module? Looks weird...
Anyways, should I wait for the bug to be confirmed or you sure ES can't deal alone with this file format?

Thank you again.

FYI

I have been working on resolving this issue because I don't want this problem to occur again and again every year rollover. Finally I realized that once the cluster is running in production, ES shall not index the rotated log files. When running for some time, Filebeat only indexes what is passing through /var/log/secure and /var/log/messages as they contain the current events and the old events (archived/rotated as secure-YYYYMMDD or messages-YYYYMMDD) have already been indexed.
And this explains why I haven't found anyone complaining about this issue.

So if you have found my post and you are experiencing the same, you belong to the unlucky people who start ES at the exact wrong moment (a few weeks in the beginning of the year).
This issue remains a bug, but not an evil one. :slight_smile: