[ES7.0.1] Field 'cn'(keyword&text type) is not extracted as expected from message field

Hi Team,

in prod elk, there's below message with containing the value of cn field is GRIFOLS, S.A.

2019-09-11 17:29:30,930 module=SCM fa=TS at=SCM.TS.LIST_SAVED_SEARCH si=566F57EC24877AC67B8D36655D69D490.vsa3029283 ci=Grifols cn=GRIFOLS, S.A. cs=dc5prd_STOCKPM8601. pi=dbPool2 ui=bcwld locale=en_US ktf1=E

But it was parsed to below, and string 'S.A.' is lost.
`

cn=GRIFOLS,

`

How to correct this?

Below is the logstash configuration for filter part

filter {
	grok {
		match=> {
			"message" =>"%{TIMESTAMP_ISO8601:logdate} "
		}
	}
	
	kv {
       field_split => " "
       value_split => "="
       trim_key => " "
       trim_value => " "
       include_brackets => false 

	}
 

ruby {
	    code => "
		    hash = event.to_hash
		    hash.each do  |k, v|
			    if(v != nil && v.kind_of?(String) && v.length > 2 && v[0,1] == '[' && v[v.length-1,1] == ']')
			    	event.set(k, v[1, v.length-2].split(','))
	   	    	    end
  	            end
  	     "
	}


	date {
		match => ["logdate", "yyyy-MM-dd HH:mm:ss,SSS"]
		target => "@timestamp"
	}
}

In that specific case you can fix it by replacing field_split with

field_split_pattern => "[^,] "

Hi Badger,

That actually doesn't work for all scenarios Here is the source message of kv pair.

module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM **cn=GRIFOLS, S.A.** cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EH] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[gender,department,division,location] if2=0 if3=1 if4=0
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM **cn=Deutsche Telekom AG** cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[gender,department,division,location] if2=0 if3=1 if4=0
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM **cn=Deutsche Bank** cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EMM] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[] if2=0 if3=1 if4=0

If field_split_patter => "[^,] " is applied like below, then cn=Deutsche Telekom AG OR cn=Deutsche Bank won't be parsed correctly, these two are all parsed to "cn=Deutsche" which is unexpected.

	kv {
       field_split => " "
       field_split_pattern => "[^,] "
       value_split => "="
       trim_key => " "
       trim_value => " "
       include_brackets => false }

Expected Result for cn field after parsing
cn=GRIFOLS, S.A.
cn=Deutsche Telekom AG
cn=Deutsche Bank

I don't see a way to fix that. You need to be able to tell the kv filter to use " " for field_split sometimes, and there is no way to do that.

@Badger,

As I mentioned in another topic , gsub might be a solution. But I can't make it work.
I add below into logstash configuration
`

mutate { gsub => [ "message", "(\s)\w+=", "?" ] }

`

and in regex101, it exactly matched the spaces that are the spliter of field. check here https://regex101.com/r/ygsL8o/1

But when I configure it in logstash, I observe that the result is still unexpected.

module=SCM?PP?SCM.PP.GENERATE?E02F07592081B6A58F7FFC3DE035D62B?JShenSCM?Deutsche Telekom AG?SCM_JShenSCM.?dbPool1?new_user_id?en_US?[]?2950?3x3?TEAMVIEW?MYSELF?ALL?[gender,department,division,location]?0?1?0\r

It replace the whole regex pattern (\s)\w+= with '?', but shouldn't the \s in the parenthesis be replaced for gsub?

Thanks,
Cherie

No. That's not the way capture groups work. You need to use a lookahead assertion, which I showed in my answer in the other thread.

1 Like

@Badger Thanks for the solution