Hi Team,
in prod elk, there's below message with containing the value of cn field is GRIFOLS, S.A.
2019-09-11 17:29:30,930 module=SCM fa=TS at=SCM.TS.LIST_SAVED_SEARCH si=566F57EC24877AC67B8D36655D69D490.vsa3029283 ci=Grifols cn=GRIFOLS, S.A. cs=dc5prd_STOCKPM8601. pi=dbPool2 ui=bcwld locale=en_US ktf1=E
But it was parsed to below, and string 'S.A.' is lost.
`
cn=GRIFOLS,
`
How to correct this?
Below is the logstash configuration for filter part
filter {
grok {
match=> {
"message" =>"%{TIMESTAMP_ISO8601:logdate} "
}
}
kv {
field_split => " "
value_split => "="
trim_key => " "
trim_value => " "
include_brackets => false
}
ruby {
code => "
hash = event.to_hash
hash.each do |k, v|
if(v != nil && v.kind_of?(String) && v.length > 2 && v[0,1] == '[' && v[v.length-1,1] == ']')
event.set(k, v[1, v.length-2].split(','))
end
end
"
}
date {
match => ["logdate", "yyyy-MM-dd HH:mm:ss,SSS"]
target => "@timestamp"
}
}
Badger
September 16, 2019, 3:39pm
2
In that specific case you can fix it by replacing field_split with
field_split_pattern => "[^,] "
Hi Badger,
That actually doesn't work for all scenarios Here is the source message of kv pair.
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM **cn=GRIFOLS, S.A.** cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EH] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[gender,department,division,location] if2=0 if3=1 if4=0
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM **cn=Deutsche Telekom AG** cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[gender,department,division,location] if2=0 if3=1 if4=0
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM **cn=Deutsche Bank** cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EMM] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[] if2=0 if3=1 if4=0
If field_split_patter => "[^,] " is applied like below, then cn=Deutsche Telekom AG OR cn=Deutsche Bank won't be parsed correctly, these two are all parsed to "cn=Deutsche" which is unexpected.
kv {
field_split => " "
field_split_pattern => "[^,] "
value_split => "="
trim_key => " "
trim_value => " "
include_brackets => false }
Expected Result for cn field after parsing
cn=GRIFOLS, S.A.
cn=Deutsche Telekom AG
cn=Deutsche Bank
Badger
September 18, 2019, 12:46pm
4
I don't see a way to fix that. You need to be able to tell the kv filter to use " " for field_split sometimes , and there is no way to do that.
@Badger ,
As I mentioned in another topic , gsub might be a solution. But I can't make it work.
I add below into logstash configuration
`
mutate { gsub => [ "message", "(\s)\w+=", "?" ] }
`
and in regex101, it exactly matched the spaces that are the spliter of field. check here regex101: build, test, and debug regex
But when I configure it in logstash, I observe that the result is still unexpected.
module=SCM?PP?SCM.PP.GENERATE?E02F07592081B6A58F7FFC3DE035D62B?JShenSCM?Deutsche Telekom AG?SCM_JShenSCM.?dbPool1?new_user_id?en_US?[]?2950?3x3?TEAMVIEW?MYSELF?ALL?[gender,department,division,location]?0?1?0\r
It replace the whole regex pattern (\s)\w+= with '?', but shouldn't the \s in the parenthesis be replaced for gsub?
Thanks,
Cherie
Badger
September 20, 2019, 1:56pm
6
No. That's not the way capture groups work. You need to use a lookahead assertion, which I showed in my answer in the other thread.
1 Like
@Badger Thanks for the solution
system
(system)
Closed
October 24, 2019, 5:42am
8
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.