Regex to match a specific key value pair in Logstash

I'm looking to remove specific key=value pair that are inside a STRING.

Say, input event is as follows:

{
	"ABC": "10119707",
	"Request_StartTime": "1558952196175",
	"Severity": "INFO",
	"UUID": "481e8cfa-399c-4996-a4d3-7e9b7ec866fa",
	"Src_LogMsg": "type=abc, user=abc.def, vid=1111, api=fooapi, email=abc.def@gmail.com, cat=1",
	"@version": "1",
	"@timestamp": "2019-05-27T10:16:36.180Z",
	"Src_Host": "Hostname",
	"Request_IpAddress": "1.1.1.1"
}

I want to remove the key=value pair user=abc.def from the Src_LogMsg string. The following works:

filter {
  if [Src_LogMsg] =~ /.+/ {
    mutate {
      gsub =>  ["Src_LogMsg","(user=(.+?)\s)",""]
    }
  }

But if the user=abc.def is at the end of Src_LogMsg as opposed to being in middle, then the above doesn't work. Please see the below screenshots:

Here user=abc.def is in the middle with cat=1 being the last k=v pair

Here user=abc.def is at the end of Src_LogMsg string. It's not removed.

Test string from which user=xyz is successfully removed

{"ABC": "10119707", "Request_StartTime": "1558952196175", "Severity": "INFO", "UUID": "481e8cfa-399c-4996-a4d3-7e9b7ec866fa", "Src_LogMsg": "type=abc, user=abc.def, vid=1111, api=fooapi, email=abc.def@gmail.com, cat=1", "@version": "1", "@timestamp": "2019-05-27T10:16:36.180Z", "Src_Host": "Hostname","Request_IpAddress": "1.1.1.1"}

Test string from which user=xyz is NOT removed:

{"ABC": "10119707", "Request_StartTime": "1558952196175", "Severity": "INFO", "UUID": "481e8cfa-399c-4996-a4d3-7e9b7ec866fa", "Src_LogMsg": "type=abc, vid=1111, api=fooapi, email=abc.def@gmail.com, cat=1, user=abc.def", "@version": "1", "@timestamp": "2019-05-27T10:16:36.180Z", "Src_Host": "Hostname","Request_IpAddress": "1.1.1.1"}

Can someone please help me form the correct regex that will remove the user=abc.def k=v pair irrespective of its location within the Src_LogMsg field.

Logstash.conf:

input {
  stdin {
    codec => json
  }
}

filter {
  if [Src_LogMsg] =~ /.+/ {
    mutate {
      gsub =>  ["Src_LogMsg","(user=(.+?)\s)",""]
    }
  }
}

output {
  stdout { codec => rubydebug { metadata => true } }
}

Logstash Version: 5.5.1

mutate { gsub => [ "Src_LogMsg", "user=[^,]+(, |$)", "" ] }
1 Like

Thank you Badger. Very helpful. However, with this, for the second test case, an extra comma and space appear. Please see below screenshot

Test Case:

{"ABC": "10119707", "Request_StartTime": "1558952196175", "Severity": "INFO", "UUID": "481e8cfa-399c-4996-a4d3-7e9b7ec866fa", "Src_LogMsg": "type=abc, vid=1111, api=fooapi, email=abc.def@gmail.com, cat=1, user=abc.def", "@version": "1", "@timestamp": "2019-05-27T10:16:36.180Z", "Src_Host": "Hostname","Request_IpAddress": "1.1.1.1"}

In all, I need to handle 3 cases:

  1. user=abc.def is at start of Src_LogMsg string
  2. user=abc.def is in middle of Src_LogMsg string
  3. user=abc.def is at end of Src_LogMsg string

You can use a second regexp to remove the trailing comma and space.

mutate { gsub => [ "Src_LogMsg", "user=[^,]+(, |$)", "", "Src_LogMsg", ", $", "" ] }

Please do not post pictures of text. Just post the text. Thanks!

1 Like

It'd be easier to simply overwrite the value with the text REDACTED or something, instead of doing multiple passes and accounting for all of the edge-cases.

filter {
  mutate {
    gsub => ["Src_LogMsg", "(?<=\buser=)[^,]+", "REDACTED"]
  }
}

The pattern (?<=\buser=)[^,]+ literally means "any string of non-comma characters that is immediately proceeded by (a word-break (\b) followed by the character sequence user=)"

1 Like

Excellent suggestion and thanks for the working example. We did think about it at start. But then it means storing dummy fields in ES for Billions of records. Since this is not a field, can't remove it using prune. Thoughts?

Excellent. Thank you Badger. I also want to remove email=abc.def@gmail.com field and so I did the following

 gsub => [ "Src_LogMsg", "(email=[^,]+(, |$))|(user=[^,]+(, |$))", "", "Src_LogMsg", ", $", "" ]

Not sure if this is the most efficient way to do.

And thank you for the note that "text" is better. Agree.

This pattern is a little more sussinct (and formatted to see the multiple phases separately)

gsub => [
  "Src_LogMsg", "(\b(email|user)=[^,]+(, |$))", "",
  "Src_LogMsg", ", $", ""
]
1 Like

This is great. Thank you v much! Makes the code very much succinct and easier to read. And more fields can be easily added.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.