My logstash configuration is working fine for most of the log cases but below exceptions are not working.
My log has following format
The json part consist of usual format "" "" both have space separator.
In normal case the json part of log looks like this :
"queue" "reg" "resReq" " rusage[mem=32000] select[rh70] span[ptile=2]" "numExHosts" "2" "execHosts" "dlhsx00176"
If i go to log supplier, it will take ages to fix it as it is not important for them.
Something which is standardized in the log is that res_req filed will have fix syntax like
it will start either with "select | rusage | span" and end with "]" but i think problem is with double quotes and space which is field separator as well as value separator.
Please suggest what i can do to quickly get rid of the problem.
I think that you could use "+ "+ as your field_split_pattern and value_split_pattern?
If there might be escaped quotes within the values that we don't want to lose, the saver method would probably be to split at " ", and then gsub all unescaped quotes ((?<!\\)") with nothing afterwards. (or split at (?<!\\)"+ "+, but the KV documentation tells me to be cautious with lookbehinds)
Just noticed that you did not use field_split_pattern and value_split_pattern like I had suggested. So this couldn't work even if my regex was correct. field_split and value_split can only be used to define a list of single-character field delimiters.
Edit: Working example (with the assumption that the values could contain escaped quotes that we don't want to ruin. If they won't, you can simplify it of course.):
Parsed as below. But it modified the original value with escaped characters
"command" => "time ./SCRIPTS/placeandroute/sequencer/load.sh -block_name ""DEFAULT_BLOCK_NAME"" -shell_mode ""-no_gui"
Even using your config, the result is same with escape characters for
"command" "time ./SCRIPTS/placeandroute/sequencer/load.sh -block_name ""DEFAULT_BLOCK_NAME"" -shell_mode ""-no_gui"""
So field names and values are wrapped in ", strings within them are wrapped in "" and you want to delete the "" if the value consists only of that string (the resReq example), but keep them otherwise (the command example)?
The following configuration looks a little bit stupid. So maybe someone can post a better solution. But this gives you: "resReq" => "select[ (rh70||rh60) && cpuf >= 1] rusage[mem=1024,eldo=1:duration=30s] "
and "command" => "time ./SCRIPTS/placeandroute/sequencer/load.sh -block_name \"\"DEFAULT_BLOCK_NAME\"\" -shell_mode \"\"-no_gui\"\"",
if there aren't any other quotes in your messages that you haven't yet told us about. This is my last try
Thanks for your support but with more & more cases, it's getting complicated. See this another below. Is it possible if i put a list of all possible key names in the logstash config for kv filter, so that it could recognize which is end of value and begin of a new key?
here in above case if logstash somehow validate the key names against a list before parsing.
Here key name "command" and then "ru_utime" because value can not contain key names.
The log is not generating key names in sequence.
Or may be a solution to exclude the known complicated key-value pairs which are almost impossible to parse? As my last option i can exclude the complicated key-values of "command", "resReq" etc?
Regarding your question :
So field names and values are wrapped in " , strings within them are wrapped in "" and you want to delete the "" if the value consists only of that string (the resReq example), but keep them otherwise (the command example)?
Yes. Field & values wrapped in "
Strings within fields & values in "" and ' as well.
See like this :
"command" "eval 'eval setenv UK_PATH ""/sw/unicad/ucdtools:/sw/unicad/edatools:/sw/beta/edatools:/sw/unicad/freetools:/work/uptpluscache/areas/services/F9V-F9P/products"";' 'setenv UCDPRJDIR ""/prj/m24256k/users/goncalvc/work/VERILOG/toplog_cut12"";'"
My expectation is to remove all escape \ so that command in should be retained it's original form that means ' is retained as it is & strings within key & values from ""(2 quote) is retained as "(1 quote). Like this "/sw/unicad/ucdtools:/sw/unicad/edatools:/sw/beta/edatools:/sw/unicad/freetools:/work/uptpluscache/areas/services/F9V-F9P/products";' 'setenv UCDPRJDIR "/prj/m24256k/users/goncalvc/work/VERILOG/toplog_cut12";'
Does the separation of your fields work correctly with my last code? Transforming "" to " in the finished fields would be an easy modification if the basic idea would work. If it doesn't, I'm honestly out of time and ideas.
Your last solution was able to parse the "command" value however it started to fail with other ones.
Till now i think the one which is best suited my data is
mutate {
gsub => ["json_part", "\n", "", "json_part", '^"+', "", "json_part", '(?<!\)"+$', ""]
}
kv {
source => "json_part"
field_split_pattern => '(?<!\)"+ "+'
value_split_pattern => '(?<!\)"+ "+'
}
however it is failing with given value of "command" key.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.