Read *.json.gz from AWS S3 bucket

Vidya_Sagar · December 23, 2021, 3:13pm

Hi All,

I am new to ELK Stack and trying to read data from S3 buckets. The json data is in compressed format and the folder structure in the S3 bucket is like

YYYY-MM-DD/.json.gz

2021-12-01/A.json.gz
2021-12-02/B.json.gz
2021-12-03/C.json.gz

Folder can have multiple files..

Looking for some code snippet to read these files using Logstash.

AquaX · December 23, 2021, 9:27pm

Read the file input plugin docs File input plugin | Logstash Reference [7.16] | Elastic
read mode supports gzip file processing but I believe you have to define a gzip codec then in your input.
However, try it without the codec and see if just the read works on it's own. I haven't tried that before.

input {
   file {
       path => ["/var/log/202*/*.json.gz"]
       codec => "gzip_lines"
       mode => "read"
   }
}

yaauie · December 28, 2021, 7:24pm

From the S3 input's docs:

Each line from each file generates an event. Files ending in .gz are handled as gzip’ed files.

Since the S3 input is line-oriented, if the contents of your GZIP files are not line-oriented (such as each being a JSON blob representing a single JSON object), you may need to use the multiline codec to buffer all of the lines into a single event, and then a json Filter to parse the contents into a structured object:

input {
  s3 {
    bucket => ""
    access_key_id => "1234"
    secret_access_key => "secret"
    codec => multiline {
      pattern => "." # anything
      what => "previous" # accumulate until EOF
    }
  }
}
filter {
  json {
    source => "message"
  }
}

system · January 25, 2022, 7:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can Logstash pull gzipped files from s3 Logstash	4	4802	February 21, 2018
Logstash s3 input .gz/folder/file patter possible? Logstash	4	177	December 19, 2022
Using S3 as input plugin Logstash	2	1397	July 6, 2017
How to get JSON data from AWS s3 bucket Logstash winlogbeat	5	2522	July 6, 2020
S3 input plugin Beats filebeat	2	277	April 19, 2021

Read *.json.gz from AWS S3 bucket

Related topics