It depends on what you mean by anonymized. If you mean pseudonymized, then there is a great Elastic blog post today that covers it. But for a field like gender, which could have a cardinality as low as two, unmasking a single value unmasks everything, so pseudonymization does not look like a good approach.
Be sure to check the note at the top of the documentation you linked to that points out the fingerprint filter is preferred to anonymize.
Another approach would be randomization. A simple ruby filter could randomly set the gender field to one of two values. But I am struggling to think of a use case, even for testing, where visualizing or analyzing random noise adds any value.
ruby { code => 'if rand <= 0.5 then event.set("gender", "M") else event.set("gender", "F") end ' }
So the third approach would be to delete the field
All of this assumes you are doing the "anonymization" using logstash before ingesting the data into elasticsearch. If you want to do it for data already in an index that is a very different problem.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.