S3 Output Plugin: Correct Way to manage codec

Hi everyone, I have the following problem:
We have in place a pipeline which consist of:

[PrestoDB Clusters]  ==auditing==> [Kafka] <== [Logstash] ==> Elastic + S3

The auditing messages on kafka are basically json messages composed of various fields which may contain ANY character typable by the user.

The ingestion on elastic works almost with no problems.
Now I wanted to write on s3 some fields (but potentially all of them) in a text file or eventually in a parquet file.

So I am using the S3 Output Plugin,
I had to configure it in this way to make it somehow work, but obviously I am facing many problem due to characters like newlines, delimiters, strange characters etc… Also this doesn’t seem like a good approach since I have 20-30 more fields to put.

s3 {
        region => "eu-west-1"
        bucket => "my-bucket"
        prefix => "audit/some/sub/folder"
        encoding => "none"
        rotation_strategy => "size_and_time"
        temporary_directory => "/tmp/logstash"
        upload_queue_size => 4
        upload_workers_count => 4
        size_file => 5242880
        time_file => 2
        codec => line {
          format => "%{[CreateDate]}|%{[orgId]}|%{[QueryID]}|%{[Catalog]}|%{[User]}|%{[Query]}|%{[QueryStartTime]}|%{[EventName]}|%{[QueryType]}|%{[QueryEndTime]}"
}

I have also tried the json codec, which does the job pretty well but I don’t want to write data in json format, since the files will be read in Presto/spark clusters by data scientists and it is not convenient to parse json with these tools.

I have tried with the csv codec but it doesn’t work at all, and I couldn’t understand why…

Is there something I am missing?

I managed to solve my problem.

I missed it both from the logs and the docs, but actually this codec plugin (csv) doesn't come installed, so you have to install it first with:

bin/logstash-plugin install logstash-codec-csv

After that I had to escape newline from fields to avoid unwanted line breaks:

mutate {
    gsub => [ "[Query]", "[\n]", "\\\\n" ]
    gsub => [ "[PreparedQuery]", "[\n]", "\\\\n" ]
}

The plugin will take care of doubling any double quote in the data (that's how you escape doublequotes).

For the separator I have opened another thread in the forum, see:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.