Bad charset encoding in field names - III

There is a bug in logstash that causes problems mentioned here and here

Access to fields with no-ASCII names fails in any environment that uses Ruby event API.
Plugins that use Java API directly work. For example, dissect plugin works OK with non-ASCII field name.

Config:

input { stdin { } }

filter {
mutate {
add_field => { "Русское название" => "Content of the field" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}

output {
stdout { codec => rubydebug }
}

Output:

[2018-06-22T18:15:11,760][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"C:/maxirmx/logstash/modules/fb_apache/configuration"}
[2018-06-22T18:15:11,791][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"C:/maxirmx/logstash/modules/netflow/configuration"}
[2018-06-22T18:15:12,094][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-06-22T18:15:13,047][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.2.4"}
[2018-06-22T18:15:13,797][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2018-06-22T18:15:19,885][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-06-22T18:15:20,119][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x2e14da1c run>"}
The stdin plugin is now waiting for input:
[2018-06-22T18:15:20,260][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}
Some data
{
"message" => "Some data\r",
"@version" => "1",
"@timestamp" => 2018-06-22T15:15:28.309Z,
"host" => "NSC181278",
"? ?\u0083?\u0081?\u0081?????? ???°?·???°??????" => "Content of the field"
}

It looks like the problem is here

JrubyEventExtLibrary.java

   @JRubyMethod(name = "get", required = 1)
   public IRubyObject ruby_get_field(ThreadContext context, RubyString reference)
   {
       return Rubyfier.deep(
           context.runtime,
          this.event.getUnconvertedField(FieldReference.from(reference.getByteList()))
      );
 }

   @JRubyMethod(name = "set", required = 2)
   public IRubyObject ruby_set_field(ThreadContext context, RubyString reference, IRubyObject value)
   {
       final FieldReference r = FieldReference.from(reference.getByteList());
       if (r.equals(FieldReference.TIMESTAMP_REFERENCE)) {
           if (!(value instanceof JrubyTimestampExtLibrary.RubyTimestamp)) {
               throw context.runtime.newTypeError("wrong argument type " + value.getMetaClass() + " (expected LogStash::Timestamp)");
           }
           this.event.setTimestamp(((JrubyTimestampExtLibrary.RubyTimestamp)value).getTimestamp());
      } else {
          this.event.setField(r, Valuefier.convert(value));
      }
      return value;
  }

reference.getByteList() does not look correct above. It shall rather be reference.getValue() IMHO

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.