I currently have a situation where an application sometimes logs sensitive data to a file. I want to ship these logs using Filebeat but would prefer to sanitize the sensitive fields before they reach Elasticsearch. I can't modify the application. Is there any way to do this with Filebeat? If it isn't currently possible, will this feature be added any time soon? Thanks a lot.
This is currently not possible in Filebeat. For these use cases we recommend to add Logstash and add do the sanitizing of events in LS before forwarding it to ES.
Out of curiosity: What exactly do you mean by sanitize fields?
What exactly is a field? you have JSON or you thinking of some regular expression to much some sub-string in your log?
By sanitizing you want to: a) remove the field b) replace with custom string (e.g. ******* of same or constant length?) c) compute and replace with hash (or randomly generated value) such that events with same sensitive fields can still be correlated?
The output is currently JSON and I wanted to replace it with a custom string (*****). But the other option of replacing it with a hash look interesting as well.
If you already have JSON, you can use the drop_fields processor to remove fields. Otherwise you will have to use an Ingest Node pipeline or Logstash to redact contents (ensure you have TLS configured, such that events are encrypted when being send).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.