Kv filter plugin trim_key option seems stopped working

Hello experts,

After upgrade logstash from 1.5 to 5.6 we started having field names after kv with leading spaces.

Configuration in 1.5:

  kv {
    source => "headers"
    field_split => "[\r\n]+"
    value_split => ":"
    trimkey => "\s"
    trim => "\s"
    add_tag => [ "got_headers" ]
  }

Configuration in 5.6:

                kv {
                    source => "headers"
                    field_split => "[\r\n]+"
                    value_split => ":"
                    trim_key => "\s"
                    trim_value => "\s"
                    add_tag => [ "got_headers" ]
                }

Resulting in 5.6:

{
" Origin" : "http://site.com",
" X-dev" : "Device1",
" Connection" : "Keep-Alive",
"headers" : " \r\n Origin: http://site.com\r\n X-dev: Device1\r\n Connection: Keep-Alive\r\n",
"tags" : ["multiline", "request", "URI_parsed", "got_api", "got_headers"]
}

There are 2 spaces before every key in "headers" : "headers" : " \r\n\s\sOrigin: http://site.com\r\n\s\sX-dev: Device1\r\n\s\sConnection: Keep-Alive\r\n", and kv.trim_key removes only one of them, while expected all be removed if i got it.

As a result all generated fields start with space symbol, e.g "\sOrigin" : "http://site.com"

Questions are:

  1. Was kv.trim_key behavior changed ?
  2. How is it possible to trim all leading/trailing symbols in kv plugin (without additional plugins usage) ?

Thanks in advance.

This happens because both trim_key and trim_value features are using regular expression patters:

Regexp.new("^[#{@trim_key}]|[#{@trim_key}]$")

It removes the first symbol of the group from the beginning of the string and the last symbol of the string if it matches too. Most likely it's not what you want because "trimming" means removing all matching symbols from the start and from the end of the string. So, the correct expressing would be:

Regexp.new("^[#{@trim_key}]+|[#{@trim_key}]+$")

This behavior have changed from the version 1 of this plugin where the expression was like this:

Regexp.new("[#{@trim_key}]")

It was matching all symbols in the group anywhere is the string. It's wrong too because only symbols from the beginning and from the end should be removed, not from the middle of the string surrounded by other symbols.

Unit tests in the plugin only check for the single space examples so this problem was not detected.

diff --git a/lib/logstash/filters/kv.rb b/lib/logstash/filters/kv.rb
index c8b5ca8..1e2ec44 100644
--- a/lib/logstash/filters/kv.rb
+++ b/lib/logstash/filters/kv.rb
@@ -286,8 +286,8 @@ class LogStash::Filters::KV < LogStash::Filters::Base
       )
     end
 
-    @trim_value_re = Regexp.new("^[#{@trim_value}]|[#{@trim_value}]$") if @trim_value
-    @trim_key_re = Regexp.new("^[#{@trim_key}]|[#{@trim_key}]$") if @trim_key
+    @trim_value_re = Regexp.new("^[#{@trim_value}]+|[#{@trim_value}]+$") if @trim_value
+    @trim_key_re = Regexp.new("^[#{@trim_key}]+|[#{@trim_key}]+$") if @trim_key
 
     @remove_char_value_re = Regexp.new("[#{@remove_char_value}]") if @remove_char_value
     @remove_char_key_re = Regexp.new("[#{@remove_char_key}]") if @remove_char_key
diff --git a/spec/filters/kv_spec.rb b/spec/filters/kv_spec.rb
index 7aae281..8bef1e7 100644
--- a/spec/filters/kv_spec.rb
+++ b/spec/filters/kv_spec.rb
@@ -709,7 +709,7 @@ describe LogStash::Filters::KV do
       plugin
     end
 
-    let(:message) { "key1= value1 with spaces | key2 with spaces =value2" }
+    let(:message) { "key1=  value1 with spaces    |  key2 with spaces  =value2" }
     let(:data) { {"message" => message} }
     let(:event) { LogStash::Event.new(data) }
     let(:options) {

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.