Filtering all Fields/Events

Hello
Having a centralized log-stream using Logstash with more than hundreds of fields/events types.
I'm trying to apply filter to all events/fields and replacing a regex in this case email address. (removing PII)
what would be the best solution to apply filters to all fields/events and replacing a regex, without the need to identify the field names ?

Currently we are trying the following , however this does not work with json/xml with multiple nested elements, objects, array..

//filter {
//ruby {
// code => '
// event.to_hash.each { |k, v|
// if v.is_a?(String) and /@/ === v
// j = v.gsub!(/\b[A-Z0-9._%a-z]+@(?:[A-Z0-9a-z]+.)[A-Za-z]{2,5}/, "-")
// event.set(k, j)
// end
// }
// '
//}
//}

Thanks again

If you need to iterate over all the fields in an event, including the contents of hashes and arrays, then this may give you some ideas.

You probably do not need to do it as a ruby script, I expect it could be rewriten it as a ruby filter that uses the code option rather than the path option to get the code from a file.

Writing a regexp to match any email address that is "valid" is a really tough problem. There are valid domains that contain characters from non-English scripts such as Chinese or Cyrillic. Of course a lot of email programs will not handle such addresses even though they follow the "rules". So exotic email addresses may be "valid" but unusable.

For PII masking I would lean to being inclusive and masking things that may not be emails. This page discusses some of the options for being more or less inclusive.

Personally I would lean towards using a POSIX class like [[:alnum:]] instead of [A-Za-z0-9]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.