Anonymise data

Hi all,

I have a question about whether something may be possible or not, what I am trying to achieve is to take a data set already in Elasticsearch and re-index this and whilst re-indexing, anonymising the IP Address fields within.

So for example,

123.123.123.123 would anonmyise to 10.0.0.1 and so forth.

Ideally it would also be so that I can query 10.0.0.1 and know that it relates to 123.123.123.123, so a separate mapping could be created so that I can see which real IP 10.0.0.1 would relate to. (Perhaps as a separate index for quick querying, or as a csv dump)

I was thinking of using Painless to try and do this, would something like this be possible using Painless or would I better served spending my efforts using Bash or something similar.

Any suggestions on how to approach such a task would be greatly appreciated!

Obviously it would be best to sanatize the data before it even gets to elasticsearch

I would suggest just using Logstash To read your Elasticsearch index, write your filters to anonmyize the data and then output the "PROCESSED" data to the new index, and save the "secure" data to another index or file as you prefer. This way you can use the filters on new data and on your currently indexed data

Painless of course can do this too on the "existing" index

I think you probably want to use Update by query, but I am not sure it can write a new record to a new index. I never looked in to that.

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.