I want to pseudonymize a specific field with a beat

What is the recommended approach to deal with logs to be as GDPR compliant as possible?
I need to pseudonymize user data. The only resource I found is dated back to march of 2018 in this blog post: https://www.elastic.co/de/blog/gdpr-personal-data-pseudonymization-part-1

Is this still best practice? Can I use the fingerprint method within a beat instead of logstash, because I'm not using logstash right now and try to keep my stack as slim as possible.

The ruby script in the logstash config file creates a new index 'identities' is there any similiar processor that can achieve the same? Can I recreate this file with specific config settings in my file/-metricbeat.yml

    input {
    tcp {
        port => 5000
        codec => json_lines
    }
    }

    filter {

    ruby {
        code => "event.set('identities',[])"
    }

    # pseudonymise ip field

        #fingerprint ip
        fingerprint {
            method => "SHA256"
            source => ["ip"]
            key => "${FINGERPRINT_KEY}"
        }

        #create sub document under identities field
        mutate { add_field => { '[identities][0][key]' => "%{fingerprint}"  '[identities][0][value]' => "%{ip}" }  }
        #overwrite ip field with fingerprint
        mutate { replace => { "ip" => "%{fingerprint}" } }

    # pseudonymise username field
        #fingerprint username
        fingerprint {
            method => "SHA256"
            source => ["username"]
            key => "${FINGERPRINT_KEY}"
        }

        #create sub document under identities field
        mutate { add_field => { '[identities][1][key]' => "%{fingerprint}"  '[identities][1][value]' => "%{username}" } }
        #overwrite username field with fingerprint
        mutate { replace => { "username" => "%{fingerprint}" } }

    #extract sub documents and yield a new document for each one into the LS pipeline. See https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html#_inline_ruby_code
    ruby {
        code => "event.get('identities').each { |p| e=LogStash::Event.new(p); e.tag('identities'); new_event_block.call(e); } "
    }

    #remove fields on original doc
    mutate { remove_field => ["fingerprint","identities"] add_field => { "source" => "fingerprint_pipeline" } }
    }

    output {

    if "identities" in [tags] {
        #route identities to a new index
        elasticsearch {
            index => "identities"
            #use the key as the id to minimise number of docs and to allow easy lookup
            document_id => "%{[key]}"
            hosts => ["elasticsearch:9200"]
            #create action to avoid unnecessary deletions of existing identities
            action => "create"
            user => "elastic"
            password => "${ELASTIC_PASSWORD}"
            #don't log messages for identity docs which already exist
            failure_type_logging_whitelist => ["version_conflict_engine_exception"]
        }

    } else {
        #route events to a different index
        elasticsearch {
            index => "events"
            hosts => ["elasticsearch:9200"]
            user => "elastic"
            password => "${ELASTIC_PASSWORD}"
        }

    }

    }

Really looking forward to your answer and thanks a lot,
Nils

Hi!

If you just want to replace what logstash does, I think you can leverage script-processor so as to manually handle usernames. Also you can define conditional indices: https://www.elastic.co/guide/en/beats/metricbeat/current/elasticsearch-output.html#indices-option-es

C.

Seems like script-processor is the right tool to execute the script. But I'm not sure how to convert this filter {...} into metricbeat script processor format. Do you know how to do it?

The elasticsearch output conditionals really help me!