Log files getting accumulated in temporary_directory path while reading logs from s3 buckets

Hi All,

We have a logstash configuration to read logs from s3 bucket.
Here is the configuration:

input {
 s3 {
        access_key_id => "**************"
        secret_access_key => "hjiufaaaa"
        bucket => "testbucket"
        region => "central"
        prefix => "abp"       
        endpoint => "https://s3.com"
        delete => true
        include_object_properties => true
        codec => "json"
        temporary_directory => "/var/opt/logstashqueue/"
        }
}

filter {

}

output {
        tcp{
                port => 6367
                codec => "json_lines"
                host => "prod-int"
        }

}

But with the above pipeline:
Issue1: we are seeing the logs files are getting accumulated in path specified in temporary directory. There is no loss of logs but logs are being read and store in Elasticsearch.

out setup: We have multiple logstash shipper server to read the logs and send to logstash parser where parsing is applied and send to Elasticsearch.

root@svlipca4$ ls 1746939.ABP.* | wc -l
569

I can use cronjob to delete those files but want to understand real meaning of these files.

Issue2: Constantly seeing

[2023-04-12T12:06:50,555][WARN ][org.logstash.Event ][imperva_shipper] Unrecognized @timestamp value type=class org.jruby.RubyFixnum
In logstash shipper server , the message is coming for every log being read and masking other errors in logstash-plain.log

incoming data: {"client":{"ip":"141.53.127.00","domain":"anusha.dyn.orange.be.","geo":{"name":"Orange India","country_iso_code":"IN"}},"imperva":{"abp":{"bot_triggered_condition_names":[]}},"@version":"1","user_agent":{"version":"110.0.0.0","original":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36","device_name":"web_browser","name":"Chrome"},"event":{"category":"web","code":"","id":"******a","provider":"abp","dataset":"ABP"},"tags":["_timestampparsefailure"],"_@timestamp":1677229422000,"@timestamp":"2023-02-24T09:04:16.273Z","http":{"request":{"body":{"bytes":6044},"method":"GET"}},"url":{"path":"/gateway/ecomfoodb2c.eshop.wcsbasketcontextsvcv2/v2/store/90004/basketcontent/getbasketcontextdata"},"server":{"domain":"paper.city.be","geo":{"name":"central-1"}}}

How to resolved the above 2 issues.

TIA

The input transfers the file from s3 to the temporary directory before processing it. It always tries to delete the file after processing it. See this thread for more info.

The json codec does not support timestamps in UNIXMS format. You could remove the codec, use mutate+gsub the change the fieldname from "@timestamp" to something else, then use a date filter to parse it and overwrite [@timestamp].

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.