I want to pseudonymize a specific field with a beat

elk51211 · October 29, 2020, 10:16am

What is the recommended approach to deal with logs to be as GDPR compliant as possible?
I need to pseudonymize user data. The only resource I found is dated back to march of 2018 in this blog post: https://www.elastic.co/de/blog/gdpr-personal-data-pseudonymization-part-1

Is this still best practice? Can I use the fingerprint method within a beat instead of logstash, because I'm not using logstash right now and try to keep my stack as slim as possible.

The ruby script in the logstash config file creates a new index 'identities' is there any similiar processor that can achieve the same? Can I recreate this file with specific config settings in my file/-metricbeat.yml

    input {
    tcp {
        port => 5000
        codec => json_lines
    }
    }

    filter {

    ruby {
        code => "event.set('identities',[])"
    }

    # pseudonymise ip field

        #fingerprint ip
        fingerprint {
            method => "SHA256"
            source => ["ip"]
            key => "${FINGERPRINT_KEY}"
        }

        #create sub document under identities field
        mutate { add_field => { '[identities][0][key]' => "%{fingerprint}"  '[identities][0][value]' => "%{ip}" }  }
        #overwrite ip field with fingerprint
        mutate { replace => { "ip" => "%{fingerprint}" } }

    # pseudonymise username field
        #fingerprint username
        fingerprint {
            method => "SHA256"
            source => ["username"]
            key => "${FINGERPRINT_KEY}"
        }

        #create sub document under identities field
        mutate { add_field => { '[identities][1][key]' => "%{fingerprint}"  '[identities][1][value]' => "%{username}" } }
        #overwrite username field with fingerprint
        mutate { replace => { "username" => "%{fingerprint}" } }

    #extract sub documents and yield a new document for each one into the LS pipeline. See https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html#_inline_ruby_code
    ruby {
        code => "event.get('identities').each { |p| e=LogStash::Event.new(p); e.tag('identities'); new_event_block.call(e); } "
    }

    #remove fields on original doc
    mutate { remove_field => ["fingerprint","identities"] add_field => { "source" => "fingerprint_pipeline" } }
    }

    output {

    if "identities" in [tags] {
        #route identities to a new index
        elasticsearch {
            index => "identities"
            #use the key as the id to minimise number of docs and to allow easy lookup
            document_id => "%{[key]}"
            hosts => ["elasticsearch:9200"]
            #create action to avoid unnecessary deletions of existing identities
            action => "create"
            user => "elastic"
            password => "${ELASTIC_PASSWORD}"
            #don't log messages for identity docs which already exist
            failure_type_logging_whitelist => ["version_conflict_engine_exception"]
        }

    } else {
        #route events to a different index
        elasticsearch {
            index => "events"
            hosts => ["elasticsearch:9200"]
            user => "elastic"
            password => "${ELASTIC_PASSWORD}"
        }

    }

    }

Really looking forward to your answer and thanks a lot,
Nils

ChrsMark · October 29, 2020, 11:20am

Hi!

If you just want to replace what logstash does, I think you can leverage script-processor so as to manually handle usernames. Also you can define conditional indices: https://www.elastic.co/guide/en/beats/metricbeat/current/elasticsearch-output.html#indices-option-es

C.

elk51211 · October 29, 2020, 12:02pm

Seems like script-processor is the right tool to execute the script. But I'm not sure how to convert this filter {...} into metricbeat script processor format. Do you know how to do it?

The elasticsearch output conditionals really help me!

system · November 26, 2020, 2:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What is the best practice to pseudonymize user data? Elastic Security	6	827	November 4, 2022
Authenticating clients - beats Logstash	5	456	October 10, 2017
Input filter only metricbeat data Beats metricbeat	5	970	May 1, 2018
Metricbeat Beats	6	2310	January 27, 2017
Create new Fields to Filter by Beats filebeat	3	376	September 10, 2019

I want to pseudonymize a specific field with a beat

Related topics