S3 output plugin

brunoof1 · February 19, 2022, 10:57pm

Hello everyone, everything good ?

I have 2 doubts. What are all the file formats supported by the s3 output plugin?

I'm using Oracle, kafka and file input and writing to s3 with s3 output.

I would like to create Parquet or ORC files.

Thanks

yaauie · February 20, 2022, 4:15am

The s3 output saves files using whatever bytestream output of the codec it is configured with. The output receives batches of events, hands each event to the codec, and concatenates the result onto a local file based on the filename pattern it is configured with. Those local files are flushed at (configurable) intervals to S3.

The S3 output's default codec is the line codec, which stringifies the event, and results in one line of content in the file per event. If you configure the s3 output with the json lines codec, it outputs newline-delimited JSON. And if you configure it with the syslog codec, it outputs lines that are syslog-encoded, etc.

Badger · February 20, 2022, 4:38am

As I understand it, both ORC and Parquet are columnar file formats. Assuming each event is a "row" of data ...

The two formats are paged/striped. I think it is conceivable that you could write a codec with a multi_receive_encoded method (which consumes a batch of events) and transforms them from row-wise to column-wise. But it would be a new development effort. I do not use these formats, so I have absolutely no idea whether anyone would be interested if you created a github project and started writing this.

It also depends on whether the pages/stripes contain information about oneanother. If the output can just append the codec's encoding of a batch to its output (e.g. a file) then it works. If the previous page/stripe has to tell you where the next one is then it may not work.

brunoof1 · February 21, 2022, 12:59pm

Thank you, @yaauie and @Badger !

Do you have a documentation that describes all codecs possibilities ?

Is there any way to change the extension from txt to json?

system · March 21, 2022, 12:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
S3 Output Plugin: Correct Way to manage codec Logstash	2	961	September 20, 2021
Parquet Format Output to S3 Logstash	2	3287	March 16, 2020
S3 Output Plugin Codec Issue Logstash	1	777	July 6, 2017
Logstash output to file and s3 is different Logstash	3	319	October 17, 2019
Logstash-output-s3 line termination Logstash	3	640	March 5, 2019

S3 output plugin

Related topics