S3 output plugin

Hello everyone, everything good ?

I have 2 doubts. What are all the file formats supported by the s3 output plugin?

I'm using Oracle, kafka and file input and writing to s3 with s3 output.

I would like to create Parquet or ORC files.

Thanks

The s3 output saves files using whatever bytestream output of the codec it is configured with. The output receives batches of events, hands each event to the codec, and concatenates the result onto a local file based on the filename pattern it is configured with. Those local files are flushed at (configurable) intervals to S3.

The S3 output's default codec is the line codec, which stringifies the event, and results in one line of content in the file per event. If you configure the s3 output with the json lines codec, it outputs newline-delimited JSON. And if you configure it with the syslog codec, it outputs lines that are syslog-encoded, etc.

As I understand it, both ORC and Parquet are columnar file formats. Assuming each event is a "row" of data ...

The two formats are paged/striped. I think it is conceivable that you could write a codec with a multi_receive_encoded method (which consumes a batch of events) and transforms them from row-wise to column-wise. But it would be a new development effort. I do not use these formats, so I have absolutely no idea whether anyone would be interested if you created a github project and started writing this.

It also depends on whether the pages/stripes contain information about oneanother. If the output can just append the codec's encoding of a batch to its output (e.g. a file) then it works. If the previous page/stripe has to tell you where the next one is then it may not work.

Thank you, @yaauie and @Badger !

Do you have a documentation that describes all codecs possibilities ?

Is there any way to change the extension from txt to json?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.