Logstash unable to parse with kv filter

My logstash configuration is working fine for most of the log cases but below exceptions are not working.
My log has following format

The json part consist of usual format "" "" both have space separator.

In normal case the json part of log looks like this :
"queue" "reg" "resReq" " rusage[mem=32000] select[rh70] span[ptile=2]" "numExHosts" "2" "execHosts" "dlhsx00176"

Correct output is
queue => reg
res_req => rusage[mem=32000] select[rh70] span[ptile=2]
numExHosts => 2
ExecHosts => dlhsx00176

Here is the exception :
"queue" "gui" "resReq" """select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] """ "numExHosts
" "1" "execHosts" "gnbsx51022"

Expected output

queue => gui
resReq => select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s]
numExHosts => 1
execHosts => gnbsx51022

My KV filter is like this :
kv {
source => "json_part"
field_split => " "
value_split => " "
trim_key => """
}

The output with exception is

"resReq" => """"select["
"(rh70||rh60)" => "&&"
"cpuf" => ">="

If i go to log supplier, it will take ages to fix it as it is not important for them.
Something which is standardized in the log is that res_req filed will have fix syntax like
it will start either with "select | rusage | span" and end with "]" but i think problem is with double quotes and space which is field separator as well as value separator.
Please suggest what i can do to quickly get rid of the problem.

I think that you could use "+ "+ as your field_split_pattern and value_split_pattern?


If there might be escaped quotes within the values that we don't want to lose, the saver method would probably be to split at " ", and then gsub all unescaped quotes ((?<!\\)") with nothing afterwards. (or split at (?<!\\)"+ "+, but the KV documentation tells me to be cautious with lookbehinds)

Thanks for your suggestion but Using "+ "+ as field split is fine but values can be blank or null. I think it won't suit for null or blank values.

So there are not always quotes?

There are always quote but
blank as " "
or
null as ""

Does "+ "+ will match them?

I tried to use "+ "+ as value & field splitter but config is failing with this error :

                                             kv {
                                                  source => "json_part"
                                                  field_split => "+ "+
                                                  value_split => "+ "+
                                                  trim_key => "\""
                                                  }

field_split => "+ "", :backtrace=>["/ELK/sw/logstash-6.4.2/logstash-core/lib/logstash/compiler.rb:41:in compile_imperative'", "/ELK/sw/logstash-6.4.2/logstash-core/lib/logstash/compiler.rb:49:in compile_graph'", "/ELK/sw/logstash-6.4.2/logstash-core/lib/logstash/compiler.rb:11:in block in compile_sources'", "org/jruby/RubyArray.java:2486:in map'", "/ELK/sw/logstash-6.4.2/logstash-core/lib/logstash/compiler.rb:10:in compile_sources'", "org/logstash/execution/AbstractPipelineExt.java:149:in initialize'", "/ELK/sw/logstash-6.4.2/logstash-core/lib/logstash/pipeline.rb:22:in initialize'", "/ELK/sw/logstash-6.4.2/logstash-core/lib/logstash/pipeline.rb:90:in initialize'", "/ELK/sw/logstash-6.4.2/logstash-core/lib/logstash/pipeline_action/create.rb:38:in execute'", "/ELK/sw/logstash-6.4.2/logstash-core/lib/logstash/agent.rb:309:in block in converge_state'"]}
[2020-07-12T15:43:16,717][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
123.499u 2.578s 0:36.42 346.1% 0+0k 0+16io 0pf+0w

You still need to wrap your parameters in quotes. Does this work for you?

kv {
  source => "json_part"
  field_split => '"+ "+' 
  value_split => '"+ "+' 
  trim_key => "\""
}

After putting the quotes the logstash didn't fail but the given solution with "+ "+ didn't work in my case.

"resReq" """select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] """

The above should come as
"resReq"=> "select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] "

however with "+ "+ it came as
">=" => "1]"

Just noticed that you did not use field_split_pattern and value_split_pattern like I had suggested. So this couldn't work even if my regex was correct. field_split and value_split can only be used to define a list of single-character field delimiters.


Edit: Working example (with the assumption that the values could contain escaped quotes that we don't want to ruin. If they won't, you can simplify it of course.):

input {
  stdin{}
}
filter {
  mutate {
    add_field => { "json_part" => '"queue" "gui" "resReq" """select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] """ "numExHosts
" "1" "execHosts" "gnbsx51022"'}
  }
  mutate {
    gsub => ["json_part", "\n", "", "json_part", '^"+', "", "json_part", '(?<!\\)"+$', ""]
  }
  kv {
    source => "json_part"
    field_split_pattern => '(?<!\\)"+ "+'
    value_split_pattern => '(?<!\\)"+ "+'
    remove_field => "json_part"
  }
}
output {
  stdout{}
}

{
         "queue" => "gui",
        "resReq" => "select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] ",
       "message" => "test",
    "numExHosts" => "1",
     "execHosts" => "gnbsx51022",
      "@version" => "1",
    "@timestamp" => 2020-07-13T10:11:20.360Z,
          "host" => "#####"
}

Simplified:

  mutate {
    gsub => ["json_part", "\n", ""]
  }
  kv {
    source => "json_part"
    field_split_pattern => '" "'
    value_split_pattern => '" "'
    remove_field => "json_part"
    trim_key => '"'
    trim_value => '"'
  }

I tried your solution which worked in below case

Log
"resReq" """select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] """

parsed correctly with
field_split_pattern => '"+ "+'
value_split_pattern => '"+ "+'
"resReq" => "select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] "

however with log
"command" "time ./SCRIPTS/placeandroute/sequencer/load.sh -block_name ""DEFAULT_BLOCK_NAME"" -shel
l_mode ""-no_gui"""

Parsed as below. But it modified the original value with escaped characters
"command" => "time ./SCRIPTS/placeandroute/sequencer/load.sh -block_name ""DEFAULT_BLOCK_NAME"" -shell_mode ""-no_gui"

How can this be addressed ?

In my response pasted above as parsed, the escape character got removed.

Even using your config, the result is same with escape characters for
"command" "time ./SCRIPTS/placeandroute/sequencer/load.sh -block_name ""DEFAULT_BLOCK_NAME"" -shell_mode ""-no_gui"""

"json_part" => "time ./SCRIPTS/placeandroute/sequencer/load.sh -block_name ""DEFAULT_BLOCK_NAME"" -shell_mode ""-no_gui",

So field names and values are wrapped in ", strings within them are wrapped in "" and you want to delete the "" if the value consists only of that string (the resReq example), but keep them otherwise (the command example)?

The following configuration looks a little bit stupid. So maybe someone can post a better solution. But this gives you:
"resReq" => "select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] "
and
"command" => "time ./SCRIPTS/placeandroute/sequencer/load.sh -block_name \"\"DEFAULT_BLOCK_NAME\"\" -shell_mode \"\"-no_gui\"\"",
if there aren't any other quotes in your messages that you haven't yet told us about. This is my last try :upside_down_face:

mutate {
    gsub => ["json_part", "\n", "", "json_part", '^"', "", "json_part", '"$', "", "json_part", '""([^"]*)""', '\"\1\"']
  }
  kv {
    source => "json_part"
    field_split_pattern => '" "'
    value_split_pattern => '" "'
    target => "json_part"
  }
  ruby {
    code => '
      event.get("json_part").each { |key, value|
        event.set(key, value.gsub(/\\"/, "\"\"").gsub(/^""([^"]*)""$/, "\\1"))
      }
      event.remove("json_part")
    '
  }

Jenni,

Thanks for your support but with more & more cases, it's getting complicated. See this another below. Is it possible if i put a list of all possible key names in the logstash config for kv filter, so that it could recognize which is end of value and begin of a new key?

"jobFile" "1/1594150835.8161101" "numExHosts" "1" "execHosts" "gnx10084" "slotUsages" "1" "cpuTime" "857.000000" "command" "eval 'eval setenv UK_PATH ""/sw/unicad/ucdtools:/sw/unicad/edatools:/sw/beta/edatools:/sw/unicad/freetools:/work/uptpluscache/areas/services/F9V-F9P/products"";' 'setenv UCDPRJDIR ""/prj/m24256k/users/goncalvc/work/VERILOG/toplog_cut12"";' ""source /sw/unicad/UnicadKernel/5.0.a-00/lib/csh/uk-header.csh;"" ""setenv PRODUCT_ROOT /sw/cadence/ic/06.17.712;"" ""source /sw/cadence/ic/06.17.712/.uk/ic.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/mentor/amsv/2017.2_2;"" ""source /sw/mentor/amsv/2017.2_2/.uk/amsv.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/mentor/calibre/2017.3_29.23;"" ""source /sw/mentor/calibre/2017.3_29.23/.uk/calibre.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/synopsys/star-rcxt/l-2016.06;"" ""source /sw/synopsys/star-rcxt/l-2016.06/.uk/star-rcxt.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/synopsys/customsim/m-2017.03-sp5;"" ""source /sw/synopsys/customsim/m-2017.03-sp5/.uk/customsim.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/unicad/STECCK/4.1-04;"" ""source /sw/unicad/STECCK/4.1-04/packaging/STECCK.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/synopsys/cosmosscope/l-2016.03-sp1;"" ""source /sw/synopsys/cosmosscope/l-2016.03-sp1/.uk/cosmosscope.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/unicad/DK_cmosf9v/1.5-04;"" ""source /sw/unicad/DK_cmosf9v/1.5-04/DK_cmosf9v.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/unicad/ArtistKit/6.4.b-01;"" ""source /sw/unicad/ArtistKit/6.4.b-01/ArtistKit.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/unicad/ArtistKit_addon/1.3.a-00;"" ""source /sw/unicad/ArtistKit_addon/1.3.a-00/ArtistKit_addon.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/unicad/PLSKit/3.2.b-04;"" ""source /sw/unicad/PLSKit/3.2.b-04/PLSKit.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/unicad/MosSelectKit/2.2.c-35;"" ""source /sw/unicad/MosSelectKit/2.2.c-35/MosSelectKit.csh;"" ""unsetenv PRODUCT_ROOT;"" ""setenv PRODUCT_ROOT /sw/cadence/incisive/15.20.030;"" ""source /sw/cadence/incisive/15.20.030/.uk/incisive.csh;"" ""unsetenv PRODUCT_ROOT;"" 'setenv UK_LOAD_PRODS ""ic amsv calibre star-rcxt customsim STECCK cosmosscope DK_cmosf9v ArtistKit ArtistKit_addon PLSKit MosSelectKit incisive"";' ""source /sw/unicad/UnicadKernel/5.0.a-00/lib/csh/uk-tailer.csh;;ncsim -gui work.simu_toplog""" "ru_utime" "845.000000" "ru_stime" "12.000000"

here in above case if logstash somehow validate the key names against a list before parsing.
Here key name "command" and then "ru_utime" because value can not contain key names.
The log is not generating key names in sequence.

Or may be a solution to exclude the known complicated key-value pairs which are almost impossible to parse? As my last option i can exclude the complicated key-values of "command", "resReq" etc?

Regarding your question :
So field names and values are wrapped in " , strings within them are wrapped in "" and you want to delete the "" if the value consists only of that string (the resReq example), but keep them otherwise (the command example)?

Yes. Field & values wrapped in "
Strings within fields & values in "" and ' as well.
See like this :
"command" "eval 'eval setenv UK_PATH ""/sw/unicad/ucdtools:/sw/unicad/edatools:/sw/beta/edatools:/sw/unicad/freetools:/work/uptpluscache/areas/services/F9V-F9P/products"";' 'setenv UCDPRJDIR ""/prj/m24256k/users/goncalvc/work/VERILOG/toplog_cut12"";'"

My expectation is to remove all escape \ so that command in should be retained it's original form that means ' is retained as it is & strings within key & values from ""(2 quote) is retained as "(1 quote). Like this "/sw/unicad/ucdtools:/sw/unicad/edatools:/sw/beta/edatools:/sw/unicad/freetools:/work/uptpluscache/areas/services/F9V-F9P/products";' 'setenv UCDPRJDIR "/prj/m24256k/users/goncalvc/work/VERILOG/toplog_cut12";'

Hope i am clear !

Does the separation of your fields work correctly with my last code? Transforming "" to " in the finished fields would be an easy modification if the basic idea would work. If it doesn't, I'm honestly out of time and ideas.

Your last solution was able to parse the "command" value however it started to fail with other ones.
Till now i think the one which is best suited my data is
mutate {
gsub => ["json_part", "\n", "", "json_part", '^"+', "", "json_part", '(?<!\)"+$', ""]
}
kv {
source => "json_part"
field_split_pattern => '(?<!\)"+ "+'
value_split_pattern => '(?<!\)"+ "+'
}

however it is failing with given value of "command" key.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.