I want to pseudonymize a specific field with a beat

What is the recommended approach to deal with logs to be as GDPR compliant as possible?
I need to pseudonymize user data. The only resource I found is dated back to march of 2018 in this blog post: https://www.elastic.co/de/blog/gdpr-personal-data-pseudonymization-part-1

Is this still best practice? Can I use the fingerprint method within a beat instead of logstash, because I'm not using logstash right now and try to keep my stack as slim as possible.

The ruby script in the logstash config file creates a new index 'identities' is there any similiar processor that can achieve the same? Can I recreate this file with specific config settings in my file/-metricbeat.yml

    input {
    tcp {
        port => 5000
        codec => json_lines
    }
    }

    filter {

    ruby {
        code => "event.set('identities',[])"
    }

    # pseudonymise ip field

        #fingerprint ip
        fingerprint {
            method => "SHA256"
            source => ["ip"]
            key => "${FINGERPRINT_KEY}"
        }

        #create sub document under identities field
        mutate { add_field => { '[identities][0][key]' => "%{fingerprint}"  '[identities][0][value]' => "%{ip}" }  }
        #overwrite ip field with fingerprint
        mutate { replace => { "ip" => "%{fingerprint}" } }

    # pseudonymise username field
        #fingerprint username
        fingerprint {
            method => "SHA256"
            source => ["username"]
            key => "${FINGERPRINT_KEY}"
        }

        #create sub document under identities field
        mutate { add_field => { '[identities][1][key]' => "%{fingerprint}"  '[identities][1][value]' => "%{username}" } }
        #overwrite username field with fingerprint
        mutate { replace => { "username" => "%{fingerprint}" } }

    #extract sub documents and yield a new document for each one into the LS pipeline. See https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html#_inline_ruby_code
    ruby {
        code => "event.get('identities').each { |p| e=LogStash::Event.new(p); e.tag('identities'); new_event_block.call(e); } "
    }

    #remove fields on original doc
    mutate { remove_field => ["fingerprint","identities"] add_field => { "source" => "fingerprint_pipeline" } }
    }

    output {

    if "identities" in [tags] {
        #route identities to a new index
        elasticsearch {
            index => "identities"
            #use the key as the id to minimise number of docs and to allow easy lookup
            document_id => "%{[key]}"
            hosts => ["elasticsearch:9200"]
            #create action to avoid unnecessary deletions of existing identities
            action => "create"
            user => "elastic"
            password => "${ELASTIC_PASSWORD}"
            #don't log messages for identity docs which already exist
            failure_type_logging_whitelist => ["version_conflict_engine_exception"]
        }

    } else {
        #route events to a different index
        elasticsearch {
            index => "events"
            hosts => ["elasticsearch:9200"]
            user => "elastic"
            password => "${ELASTIC_PASSWORD}"
        }

    }

    }

Really looking forward to your answer and thanks a lot,
Nils

Hi!

If you just want to replace what logstash does, I think you can leverage script-processor so as to manually handle usernames. Also you can define conditional indices: https://www.elastic.co/guide/en/beats/metricbeat/current/elasticsearch-output.html#indices-option-es

C.

1 Like

Seems like script-processor is the right tool to execute the script. But I'm not sure how to convert this filter {...} into metricbeat script processor format. Do you know how to do it?

The elasticsearch output conditionals really help me!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.