Logstash is processing old events

Nikhitha_Karennagari · February 23, 2023, 3:03pm

Getting continuous errors like below in logstash

{ "timestamp": "2023-02-10T14:04:33.661-08:00", "severity": "warning", "message": "Could not index event to Elasticsearch. {:status=>404, :action=>['index', {:_id=>nil, :_index=>'index-2022.12.12', :routing=>nil}, #LogStash::Event:0x3450fd23], :response=>{'index'=>{'_index'=>'index-2022.12.12', '_type'=>'_doc', '_id'=>nil, 'status'=>404, 'error'=>{'type'=>'index_not_found_exception', 'reason'=>'no such index [index-2022.12.12]', 'index'=>'index-2022.12.12', 'resource.id'=>'index-2022.12.12', 'resource.type'=>'index_expression', 'index_uuid'=>'na'}}}}"}

Logstash on timestamp 2023-02-10T14:04:33.661-08:00 is trying to process old index index-2022.12.12.
Index gets created based on @timestamp of logstash. But why current logstash process old index?

leandrojmp · February 23, 2023, 3:14pm

It is impossible to know unless you provide more information like the logstash pipeline you are using.

Nikhitha_Karennagari · February 24, 2023, 11:26am

This is the logstash pipeline

input {
  beats {
    id => "filebeat_tls"
    port => 5044
    type => filebeat
  }
filter {
  if [type] == "readiness" {
    drop {}
  }
  else if [type] == "filebeat" {

    if [input][type] == "container" {
      if![kubernetes]{
        grok {
          match => {
            "[log][file][path]" => "/var/log/containers/%{DATA:pod_name}_%{DATA:namespace}_%{GREEDYDATA:container_name}-%{DATA:container_id}.log"
          }
        }
        
         
      }

      if [message] and [message] != "" {
        json {
          skip_on_invalid_json => true
          source => "[message]"
          remove_field => ["stream"]
        }
      }
      if [ts] {
        mutate {
          copy => { "@timestamp" => "ts" }
        }
      }
      if [stream] {
        mutate {
          rename => {"[stream]" => "[severity]"}
          add_field => {"[version]" => "0.3.0"}
          copy => { "@timestamp" => "timestamp" }
          add_tag => [ "forced_conversion" ]
        }
        if [severity] == "stderr" {
          mutate {
            replace => {"[severity]" => "error"}
          }
        } else if [severity] == "stdout" {
          mutate {
            replace => {"[severity]" => "info"}
          }
        }
      }
    }
output {
  pipeline { send_to => "searchengine_pipeline" }
}

logstash.yml:
----
http.host: "0.0.0.0"
http.port: 0000
log.level: "info"
pipeline.workers: 2
pipeline.batch.size: 2048
pipeline.batch.delay: 50
path.logs: /opt/logstash/resource
pipeline.ecs_compatibility: disabled

Ignore older is set to 24h in filebeat and LT persistant queue contains no data(0 event count)

queue" : {
        "events" : 0,
        "type" : "persisted",
        "data" : {
          "free_space_in_bytes" : 331534618624,
          "storage_type" : "xfs"
        },
        "capacity" : {
          "max_queue_size_in_bytes" : 1073741824,
          "queue_size_in_bytes" : 45056621,
          "max_unread_events" : 0,
          "page_capacity_in_bytes" : 67108864
        },
        "events_count" : 0,
        "queue_size_in_bytes" : 45056621,
        "max_queue_size_in_bytes" : 1073741824

leandrojmp · February 24, 2023, 1:40pm

Since the name of index get the date information from the field @timestamp and you are getting the value for the @timestamp field from a field in your document, you need to check in the source file that filebeat is reading if you have old values for this field.

There is not in your Logstash pipeline that would do that, the issue is probably on your source file, not even in Filebeat.

From what you shared the value of the @timestamp field comes from the value of the ts field in your documents, so you may have events where this value is older than the current date.

Nikhitha_Karennagari · February 24, 2023, 2:00pm

@leandrojmp ,Thanks for your response.But copy in mutate filter is from source to destination right?
It means "@timestamp" field value is copied to "ts" field and not "ts" to @timestamp

leandrojmp · February 24, 2023, 2:04pm

Oh yeah, my mistake, you are right.

So, where is your @timestamp comming from? Do you have anything else in the searchengine_pipeline pipeline?

If not, Logstash will use the current time as the @timestamp, unless you have it coming from your source document from Filebeat, which you didn't share.

Nikhitha_Karennagari · March 2, 2023, 6:18am

@leandrojmp, I have checked the source file, it does not have any @timestamp field in it. Due to data privacy, it is not possible for me to provide the source file here.
But I still doubt on the Persistent queue, DO you have any idea, what is the behavior when logstash's output destination such as lumberjack is not available for sometime and logstash restarted due to some reason?

leandrojmp · March 2, 2023, 1:11pm

The persistent queue resides between the input and the filter block, if you have it enabled when logstash receives a message it will be put on this queue to be processed.

If an output has some issues the messages will start to accumulate in the persisted queue until it is full, when it is full logstash will stop to accept new messages until the output is back and it can start to drain the persisted queue.

When you restart logstash, it will start to process the persisted queue again.

system · March 30, 2023, 1:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash writing to older index Logstash	5	47	October 30, 2024
Incorrect feed in current logstash index Logstash	7	572	August 13, 2019
Logstash output date, using timestamp of filebeat instead of server time Logstash elastic-stack-monitoring	13	732	June 7, 2024
Logstash not indexing new files Logstash	4	512	December 15, 2019
Logstash and index recreation Logstash	2	244	May 27, 2019

Logstash is processing old events

Related topics