Anonymize some fields in specifc index

Hi all,

I'd like to know if is possible in Elasticsearch to store some fields, in a specific index, anonymized.

So, for example, I'd like that for index named "index_secret", the fields named "gender" and "religion", are stored anonymized in Elasticsearch.

I found this plugin but it's not clear for me if I can tell to anonymized fields in one specific index.

Thank you for your time.


It depends on what you mean by anonymized. If you mean pseudonymized, then there is a great Elastic blog post today that covers it. But for a field like gender, which could have a cardinality as low as two, unmasking a single value unmasks everything, so pseudonymization does not look like a good approach.

Be sure to check the note at the top of the documentation you linked to that points out the fingerprint filter is preferred to anonymize.

Another approach would be randomization. A simple ruby filter could randomly set the gender field to one of two values. But I am struggling to think of a use case, even for testing, where visualizing or analyzing random noise adds any value.

ruby { code => 'if rand <= 0.5 then event.set("gender", "M") else event.set("gender", "F") end ' }

So the third approach would be to delete the field :smiley:

All of this assumes you are doing the "anonymization" using logstash before ingesting the data into Elasticsearch. If you want to do it for data already in an index that is a very different problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.