Logstash sends logs with wrong @timestamp

kausar · October 8, 2019, 1:08pm

Hi,

I have two logstash servers that writes to Elasticsearch. Filebeat reads logs from file and sends them to these logstash servers. I am conducting a test for failover/outage and also what will happen if both the logstash servers are down. This is how the test was conducted.

Stopped both Logstash servers. October 7 11:00
Started Filebeat on multiple clients. October 7 11:15
Started the application that generates the logs. October 7 11:20
Kept the logstash servers off for 12 hours to mimic an outage of 12 hours and then bring them back on. October 8 00:20
Stopped all Filebeat instances October 8 11:00

Now when I check Kibana to see what time the logs were sent. I see there are spikes right after October 8 00:20 which makes sense as all the logs will be coming in to Elasticsearch when the logstash servers have started but what's odd is that Kibana shows that some logs were inserted between October 7 12:00 and October 7 15:00. Logstash servers were not even ON at this time. I am using the @timestamp field. Am I missing something? How are the logs sent at time when Logstash servers are not even running?

Regards,
Kausar

rugenl · October 8, 2019, 1:50pm

Filebeat keeps the "file state" of each harvested file. Combined with "at least once delivery" (Next paragraph in the llink). When logstash was restarted, the logs generated during the outage (limited by ignore_older) are sent.

kausar · October 8, 2019, 2:58pm

Hi Len,

Thanks for the reply. However that still doesn't explain what I see in Kibana. If none of the Logstash servers were running how were the logs inserted in that time period. I have attached a screenshot from Kibana that might explain the problem a bit clearly.

Badger · October 8, 2019, 3:06pm

That would be expected if you are parsing [@timestamp] from data in the log itself.

kausar · October 8, 2019, 3:13pm

We are not. If we were doing that then we wouldn't have that big gap in between because application is constantly writing more logs to the file. Do you know how the @timestamp field is actually calculated? Is it the time on Logstash server or taken from some meta data field of filebeat?

Badger · October 8, 2019, 3:28pm

It is set to the current time when the event is created. It can then be reset using a date filter. Not sure if filebeat can influence it.

kausar · October 8, 2019, 4:02pm

Thanks. There is also another Constructor for Event that checks for parsed timestamp. If a timestamp is passed it uses that. In my case Timestamp.now would have returned a value that's after October 8 00:20 because that's when I started the servers. All the events received after that should have timestamp value after 00:20. But in ES some of the events have times from October 7. So I think it's using the second constructor. I will have to check the beats input plugin.

rugenl · October 8, 2019, 5:53pm

OK, it is a puzzle UNLESS your config is using a timestamp from the log as the @timestamp in Kibana. This is standard for a lot of configurations.

Is this a filebeat module? Do you have logstash filters configured?

Len Rugen

rugenl at yahoo.com

kausar · October 9, 2019, 9:16am

Logstash Config

input {
  beats {
    port => 5044
  }
}

filter{
  grok{
    match => { "message" => "%{LOGLEVEL:level} %{GREEDYDATA:rawMessage}"}
  }
  mutate{
    strip => ["rawMessage"]
  }
  json{
    source => "rawMessage"
    remove_field => [ "log","host","prospector", "offset", "input", "tags", "beat","meta"]
  }
}

output {
  elasticsearch {
    hosts => ["<es-ip-address>:9200"]
    index => "default-%{+YYYY.MM.dd}"
  }
}

Filebeat Config

filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - 'c:\logs\*.log'
output.logstash:
  hosts: ["<logstash1-ip-address>:5044","<logstash2-ip-address>:5044"]

Log file sample

12:10:21.3026 Info {"message":"Execution started.","level":"Information","timeStamp":"2019-10-07T12:10:21.302638+01:00","jobId":"b94e6fed-cd1c-4e17-af1d"}
12:10:42.0211 Info {"message":"Getting info","level":"Information","timeStamp":"2019-10-07T12:10:42.021166+01:00","jobId":"b94e6fed-cd1c-4e17-af1d"}

Example JSON shown in Kibana

{
  "_index": "default-2019.10.07",
  "_type": "doc",
  "_id": "2kGLqG0BeOghT3kEk8x5",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2019-10-07T11:10:27.099Z",
    "timeStamp": "2019-10-07T12:10:21.302638+01:00",
    "rawMessage": "{\"message\":\"Execution started.\",\"level\":\"Information\",\"timeStamp\":\"2019-10-07T12:10:21.302638+01:00\",\"jobId\":\"b94e6fed-cd1c-4e17-af1d\"}",
    "ecs": {
      "version": "1.0.1"
    },
    "message": "Execution started.",
    "jobId": "b94e6fed-cd1c-4e17-af1d",
    "agent": {
      "hostname": "<redacted>",
      "ephemeral_id": "166ea80d-70f1-447a-896b",
      "type": "filebeat",
      "version": "7.3.0",
      "id": "9be020ff-54ff-43c5-654"
    },
    "level": "Information",
    "@version": "1"
  },
  "fields": {
    "timeStamp": [
      "2019-10-07T11:10:21.302Z"
    ],
    "@timestamp": [
      "2019-10-07T11:10:27.099Z"
    ],
    "sc-count": [
      1
    ],
    "sc-timeOfDay": [
      11
    ]
  },
  "sort": [
    1570446627099
  ]
}

kausar · October 10, 2019, 4:10pm

We checked Filebeat logs on 3 different machines. Interesting thing is that Logstash keep the actual @timestamp from Filebeats for the first 4117 records in the log file. 4118 and onwards are there in ES with correct @timestamp i.e. when Logstash servers were started. Records 1-4117 are with @timestamp values when they were actually inserted into the log file and not when Logstash was started. What is this 4117 magical number? Does Filebeat have some sort of default value for number of records to keep in memory and then stop if Logstash is not available?

Badger · October 10, 2019, 4:52pm

The queues in filebeat are described here. It queues 4096 events by default, there may also be an IO buffer at the input stage that queues a few more (I am speculating on that).

system · November 7, 2019, 4:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash output date, using timestamp of filebeat instead of server time Logstash elastic-stack-monitoring	13	338	June 7, 2024
@timestamp kibana and log time are differents Elasticsearch	3	5689	May 21, 2019
Is @timestamp in kibana from filebeat or logstash? Kibana	10	798	March 31, 2023
Filebeat sends 20 days old logs Beats elastic-stack-monitoring , filebeat	1	212	October 26, 2023
@timestamp and timestamp value differ Elasticsearch	1	514	November 10, 2017

Logstash sends logs with wrong @timestamp

Related topics