Log files getting accumulated in temporary_directory path while reading logs from s3 buckets

Anusha_Kusanghi · April 12, 2023, 1:18pm

Hi All,

We have a logstash configuration to read logs from s3 bucket.
Here is the configuration:

input {
 s3 {
        access_key_id => "**************"
        secret_access_key => "hjiufaaaa"
        bucket => "testbucket"
        region => "central"
        prefix => "abp"       
        endpoint => "https://s3.com"
        delete => true
        include_object_properties => true
        codec => "json"
        temporary_directory => "/var/opt/logstashqueue/"
        }
}

filter {

}

output {
        tcp{
                port => 6367
                codec => "json_lines"
                host => "prod-int"
        }

}

But with the above pipeline:
Issue1: we are seeing the logs files are getting accumulated in path specified in temporary directory. There is no loss of logs but logs are being read and store in Elasticsearch.

out setup: We have multiple logstash shipper server to read the logs and send to logstash parser where parsing is applied and send to Elasticsearch.

root@svlipca4$ ls 1746939.ABP.* | wc -l
569

I can use cronjob to delete those files but want to understand real meaning of these files.

Issue2: Constantly seeing

[2023-04-12T12:06:50,555][WARN ][org.logstash.Event ][imperva_shipper] Unrecognized @timestamp value type=class org.jruby.RubyFixnum
In logstash shipper server , the message is coming for every log being read and masking other errors in logstash-plain.log

incoming data: {"client":{"ip":"141.53.127.00","domain":"anusha.dyn.orange.be.","geo":{"name":"Orange India","country_iso_code":"IN"}},"imperva":{"abp":{"bot_triggered_condition_names":[]}},"@version":"1","user_agent":{"version":"110.0.0.0","original":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36","device_name":"web_browser","name":"Chrome"},"event":{"category":"web","code":"","id":"******a","provider":"abp","dataset":"ABP"},"tags":["_timestampparsefailure"],"_@timestamp":1677229422000,"@timestamp":"2023-02-24T09:04:16.273Z","http":{"request":{"body":{"bytes":6044},"method":"GET"}},"url":{"path":"/gateway/ecomfoodb2c.eshop.wcsbasketcontextsvcv2/v2/store/90004/basketcontent/getbasketcontextdata"},"server":{"domain":"paper.city.be","geo":{"name":"central-1"}}}

How to resolved the above 2 issues.

TIA

Badger · April 12, 2023, 4:23pm

The input transfers the file from s3 to the temporary directory before processing it. It always tries to delete the file after processing it. See this thread for more info.

The json codec does not support timestamps in UNIXMS format. You could remove the codec, use mutate+gsub the change the fieldname from "@timestamp" to something else, then use a date filter to parse it and overwrite [@timestamp].

system · May 10, 2023, 4:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to handle rotating directory in input file for logstash Logstash	4	341	March 14, 2022
Logstash s3 parsing issue Logstash	1	185	May 4, 2023
S3 input missing files Logstash	4	2393	March 6, 2019
Bucket option must not contain a forward-slash (/) Logstash	1	989	July 8, 2019
Docker S3 log error Logstash	6	958	July 6, 2017

Log files getting accumulated in temporary_directory path while reading logs from s3 buckets

Related topics