Logstash output date, using timestamp of filebeat instead of server time

Mansoor_Ur_Rehman · June 5, 2024, 11:40am

I have setup a cluster filebeat -> logstash -> elasticsearch,
i have a lifecycle policy that makes the indexes readonly after 7 days.
i have out for logsatash
index => "log-%{[fields][log_name]}-%{+YYYY.MM.dd}"
the issue i am facing is sometimes my filebeat malfunction and sends logs more than 7 days old.
As per my knowledge logstash should take the input from filebeat and send the logs to elasticsearch, with the time of logstash , not with the timestamp send by filebeat in the document.
i am getting errors that logstash cannot write in the indexes because the logs are readonly.
when i check the log data, which logstash is unable to write. its a 7 days old log and also have the timestamp 7 days old.

i have two questions is it possible that logstash sends the logs using the timestamp within the log file.
my second question is how to i debug the filebeat issue. as my logs rotated every one hour.

leandrojmp · June 5, 2024, 1:09pm

Not exactly, logstash uses the value of the @timestamp field, this field is automatically created by logstash when an event enters the pipeline and has the value of the current date and time, but if your document also has a @timestamp field and you are parsing it in logstash, the @timestamp field from your document will replace the @timestamp generated by logstash.

You can change the @timestamp field to have the current date and time, but this will also change it on your document, so assuming that you got a log from 2024-05-27 but it is processing this log today, if you change the @timestamp field this log will now have the date of today, which is misleading.

You should not change the @timestamp of your documents.

The alternative is to use a ruby filter to create a field with the current date, and then use this field in the output.

The following ruby filter does that.

    ruby {
        code => "
            output_date = DateTime.now().strftime('%Y-%m-%d')
            event.set('output_date',output_date)
        "
    }

Then in your output you could use this new created field.

index => "log-%{[fields][log_name]}-%{output_date}"

Mansoor_Ur_Rehman · June 5, 2024, 8:20pm

Thankyou for a quick response.
As per you reply i understand. that logstash timestamp will be the final timestamp.
Now this is the second time i am facing the issue. somehow logstash is picking the 7 days old logs and sending them to elasticsearch.
If you can check the logs i have shared. the log is 7 days old and have the logstash index name appended with the date of logstash. but if you can check the retry action, its the date of today.
I am unable to uderstand , how come a logs can be stored by logstash and sent after 7 days.
the logstream has 20GB of data ingestion daily. I have persistant logs setup for 200GB , so if in case elasticsearch is not working, the logs will be stored in the logstash. and i am having data ingested in elasticsearch without any delay daily.

[2024-06-05T00:59:46,076][INFO ][logstash.outputs.elasticsearch][main][f652e9d1e3112e62e9047b819b879044d7dac35e1a762234b4a312d442c17110] Retrying failed action {:status=>403, :action=>["index", {:_id=>nil, :_index=>"28-my_log_index-2024.05.30", :routing=>nil}, {"tags"=>["*******"], "ecs"=>{"version"=>"8.0.0"}, "log"=>{"file"=>{"path"=>"******.log"}, "offset"=>620200}, "message"=>"{\"message\":\"Request and Response.\",\"context\":{\"request\":{\"****\":\"******\",\"*****\":44.37,\"*******de_balance_verified\":true,\"updated_at\":\"30-05-2024 11:12:23.624300\"},\"response\":{\"MessageId\":\"*****\",\"@metadata\":{\"statusCode\":200,\"effectiveUri\":\"https:\\/\\/*****.com\",\"headers\":{\"***d\":\"*****\",\"date\":\"Thu, 30 May 2024 11:12:23 GMT\",\"content-type\":\"text\\/xml\",\"content-length\":\"294\",\"connection\":\"keep-alive\"},\"transferStats\":{\"http\":[[]]}}}},\"level\":200,\"level_name\":\"INFO\",\"channel\":\"*****-log\",\"datetime\":{\"date\":\"2024-05-30 11:12:23.685689\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"extra\":[]}", "@timestamp"=>2024-05-30T11:12:26.617Z, "fields"=>{"*****"=>"*****"}, "input"=>{"type"=>"log"}, "@version"=>"1", "agent"=>{"type"=>"filebeat", "id"=>"78a4c183-3c97-4ad4-a082-cbdccee12014", "name"=>"filebeat-jk4j7", "ephemeral_id"=>"5ddcca5e-ead6-487e-a5d2-358e2a12147e", "version"=>"8.9.0"}, "host"=>{"name"=>"filebeat-jk4j7"}}], :error=>{"type"=>"cluster_block_exception", "reason"=>"index [28-*****-2024.05.30] blocked by: [FORBIDDEN/8/index write (api)];"}}

Mansoor_Ur_Rehman · June 5, 2024, 10:51pm

I am sorry, I misunderstood.
so it means, If filebeat sends the logs with timestamp of 30th may. the logstash will send it to the index of 30th may. it will not care what would be the current date. right?

leandrojmp · June 5, 2024, 11:22pm

Basically yes, if your document has a field named @timestamp or if you are parsing some field from your document using the date filter in your logstash, this will be the time used by logstash when using %{+YYYY.MM.dd}.

The string %{+YYYY.MM.dd} always use the value of the @timestamp field.

For example, if today, 2024-06-05, you are indexing some logs where the document has a @timestamp field with the value 2024-02-01, the string %{+YYYY.MM.dd} will be 2024-02-01.

It is not clear if this @timestamp field is being sent by filebeat or if you are using any date filter to parse some date as you didn't share your full pipeline.

But as mentioned in the previous answer, if you want to index the logs using the current date, you need to use the ruby filter mentioned to create a field with the current date.

Mansoor_Ur_Rehman · June 7, 2024, 12:32pm

Thank you for the information.
I have provided the context of the issue i am facing. Let share some more details. I have a kubernetes cluster and running a filebeat deamon set. all my applications write logs on hostpath and filebeat has that path mounted as readonly in deamonsets. we have around 22 servers running on aws with autoscaling enabled.we are also running logrotate deamonset to rotate the logs and logrotate has the host path mounted.
I have investigated with refrence to the information you have provided. all the logs were sent from the single machine. and that machine was 7 days old. the machine was deleted so i cannot check what happened within the machine.
As much as i have read the documentation. the filebeat add its own timestamp when shipping logs to the logstash or elasticsearch or any other output.
As per the information you have provided, it clearly seems the issue from a filebeat not logstash.
Is it possible that filebeat also read a date from the logs and save it as timestamp?

Mansoor_Ur_Rehman · June 7, 2024, 12:35pm

Hi, I have investigated a little further. I found that Lohstash is sending to a different index name than the date mentioned in the timestamp, Which you mentioned above. Can you please guide me how do i debug the issue?

leandrojmp · June 7, 2024, 12:41pm

I'm sorry, but it is really confusing to understand what is your issue here.

First you said that you have this output in your logstash:

index => "log-%{[fields][log_name]}-%{+YYYY.MM.dd}"

But the Kibana screenshot you shared has a different index name which indicates that you are probably using data streams, so the index configuration you shared is not the same that is working.

Unless you share you entire logstash configuration and your filebeat configuration, it is not possible to know what is the issue here.

Mansoor_Ur_Rehman · June 7, 2024, 12:54pm

my logstash confguration

 `input {
  beats {
    port => 5044
    enrich => none
  }
}

filter {
  if "**index_name_1****" in [fields][*****_log_name] 
    json {
      source => "message"
      target => "api"
      remove_field => [ "message", "event" ]
    }
// I have mutiple fields defined here
  } else if "**index_name_2**** in [fields][*****_log_name] or
 {
    json {
      source => "message"
      target => "api"
    }
}


output {
  if [fields][*****_log_name] == "**index_name_1****" {
    elasticsearch {
      hosts => ["elasticsearch host clusters, I have 3 hosts."]
      index => "**index_name_1****-%{+YYYY.MM.dd}"
      ilm_policy => "**index_name_1****"
      ilm_rollover_alias => "**index_name_1****-"
      ilm_enabled => true
      user => "my_user"
      password => "${LS_ELASTICSEARCH_PASSWORD}"
      ssl_enabled => true
      ssl_certificate_authorities => "/etc/logstash/ca.crt"
      ssl_verification_mode => "full"
    }
  } else if [fields][*****_log_name] {
    elasticsearch {
      hosts => ["elasticsearch host clusters, I have 3 hosts."]
      index => "28-%{[fields][*****_log_name]}-%{+YYYY.MM.dd}"
      ilm_policy => "28-day-retention"
      ilm_enabled => true
      user => "my_user"
      password => "${LS_ELASTICSEARCH_PASSWORD}"
      ssl_enabled => true
      ssl_certificate_authorities => "/etc/logstash/ca.crt"
      ssl_verification_mode => "full"
    }
  } else {
    elasticsearch {
      index => "other-%{+YYYY.MM.dd}"
      hosts => ["elasticsearch host clusters, I have 3 hosts."]
      ilm_policy => "28-day-retention"
      ilm_enabled => true
      user => "my_user"
      password => "${LS_ELASTICSEARCH_PASSWORD}"
      ssl_enabled => true
      ssl_certificate_authorities => "/etc/logstash/ca.crt"
      ssl_verification_mode => "full"
    }
  }
}

my filebeat configuration, i have more than 100 files configured

apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: monitoring

data:
  filebeat.yml: |-
    filebeat.inputs:
    - type: log
      id: *****
      enabled: true
      paths:
        - ******/***/laravel-*.log
      multiline.pattern: '^\[\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\]'
      multiline.negate: true
      multiline.match: after
      #scan_frequency: 1s
      tags: ["*****"]
      fields: { *****_log_name: "*****" }
      exclude_files: ['\.gz$']


    - type: log
      id: *********
      enabled: true
      paths:
        - ******/********/access.log
      tags: ["*********"] 
      fields: { *****_log_name: "*********" }       

      processors:
      - add_docker_metadata: ~
      - add_cloud_metadata: ~
      - add_host_metadata: ~
      multiline.pattern: '^[[:space:]]'
      multiline.negate: false
      multiline.match: after

    queue:
      disk:
        path: "/usr/share/filebeat/data"
        max_size: 30gb
    output.logstash:
      hosts: ["*******:5044"]

leandrojmp · June 7, 2024, 1:08pm

You have 3 elasticsearch outputs and none of them matches the index pattern you shared.

Also, you redacted a lot of information that would help on understand from where your logs are being sent, so it is hard to troubleshoot without seeing the exactly index names being used.

I would assume that your index is using some rollover and the index configurations in the logstash output are note being used.

Mansoor_Ur_Rehman · June 7, 2024, 1:14pm

Yes we are using rollover in lifecycle policy. i am sharing a dashboard screenshot , as the logs are all pushed on the same days timestamp. but the index names are ranged from 30th may to 4th june

leandrojmp · June 7, 2024, 1:19pm

If you are using rollover the index option in the elasticsearch output will not be used.

In your lifecycle policy you have a rollover alias, this will be used for naming the indices, and the backing indices will be created according to your rollover configuration.

The date on the documents and the date of the index will not be related.

Mansoor_Ur_Rehman · June 7, 2024, 1:25pm

Let me share some more details. The server stayed running from 29th may to 5th june.

We faced this isse on 5th june morning.

The thing is we are unable to determine what broke. as there is no other server that send the forbidden error.
which means something went wrong with one server.
There server was running fine for 5 days , and suddenly it started sending logs that were 7 days old.
Can you make sum something of this?

Mansoor_Ur_Rehman · June 7, 2024, 3:51pm

Is it possible that for a couple of days this machine was blocked by logstash. or something similer for which it was unable to send the logs. if yes how can i debug it?
We have 22 servers with filebeat daeminset and one logstash and 3 elasticsearch servers.

Topic		Replies	Views
Filebeat touches my @timestamp even I've explicitly said don't do this! Beats filebeat	5	2633	June 30, 2020
Logstash sends logs with wrong @timestamp Logstash	11	2424	November 7, 2019
Filebeat / Winlogbeat -> Logstash -> Elastic '@timestamp' problem Logstash	2	357	November 12, 2019
@timestamp kibana and log time are differents Elasticsearch	3	5689	May 21, 2019
Filebeat new timestamp field Beats filebeat	3	2604	May 29, 2018

Logstash output date, using timestamp of filebeat instead of server time

Related topics