S3 CSV Codec - Using a non printable character as separator

Hammond95 · August 23, 2021, 4:36pm

Is it possible to use a non printable character as a separator in the csv codec?

codec => csv {
          columns => ["col1", "col2", "col3"]
          charset => "UTF-8"
          separator => "\\u001F"
}

I have tried the following:
\u001f, \u001F, \\u001f, \\u001F, \u{001F}

some ideas?

If this is not possible, how would I have to modify the code to make it possible?

github.com

logstash-plugins/logstash-codec-csv/blob/master/lib/logstash/codecs/csv.rb

# encoding: utf-8
require "logstash/codecs/base"
require "logstash/util/charset"
require "logstash/event"

require 'logstash/plugin_mixins/ecs_compatibility_support'
require 'logstash/plugin_mixins/ecs_compatibility_support/target_check'
require 'logstash/plugin_mixins/validator_support/field_reference_validation_adapter'
require 'logstash/plugin_mixins/event_support/event_factory_adapter'
require 'logstash/plugin_mixins/event_support/from_json_helper'

require "csv"

class LogStash::Codecs::CSV < LogStash::Codecs::Base

  include LogStash::PluginMixins::ECSCompatibilitySupport(:disabled, :v1, :v8 => :v1)
  include LogStash::PluginMixins::ECSCompatibilitySupport::TargetCheck

  extend LogStash::PluginMixins::ValidatorSupport::FieldReferenceValidationAdapter

This file has been truncated. show original

Badger · August 23, 2021, 5:16pm

It's not the codec that you would modify to accept those UTF-16 strings, it is the compiler for the logstash configuration language.

However, there is no need to do that, you can just use the literal character. When logged into a UNIX host from a Windows environment I would use Ctrl/V Alt031 to generate it. od shows that as a 'us' character.

Hammond95 · August 23, 2021, 5:31pm

Yeah I know that I can copy paste it, but I wanted to avoid this option and stick with the \u representation or \x repr.

I have tried but my YAML is not happy with that character...

Badger · August 24, 2021, 3:30am

An interesting related post is here. In that case \u00xx works inside a string, but not because logstash decodes it. I would guess something in Manticore decodes it, or else on the receiving end in elasticsearch. It will not work here.

Hammond95 · August 25, 2021, 12:55am

In the end I did my fork of the plugin to use that specific character.

system · September 22, 2021, 12:56am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash CSV Filter Using Unicode delimiter ( SOH ) Logstash	5	4430	July 6, 2017
Error parsing CSV - Illegal quoting with non-ascii (default) Logstash	2	3158	May 20, 2017
Change character encoding in Logstash Logstash	3	1698	November 22, 2018
CSV with fields in Portuguese Logstash	3	476	March 16, 2019
Using alt key code in logstash Logstash	8	502	September 17, 2020

S3 CSV Codec - Using a non printable character as separator

Related topics