Hi,
As part of a data protection project, I need to given a certain document, hash some of its values like IP, username etc. In order to do so I use following logstash file:
input {
stdin {
add_field => {"type" => "msgtrk"}
add_field => {"logstash_source" => "XXXXXX"}
}
}
filter {
if ([type] == "msgtrk") {
csv {
columns => ["timestamp","clientip","clienthostname","serverip","serverhostname","sourcecontext","connectorid","sourcemsg","eventid","internalmessageid","messageid","recipientaddress","recipientstatus","totalbytes","recipientcount","relatedrecipientaddress","reference","messagesubject","senderaddress","returnpath","messageinfo","directionality","tenantid","originalclientip","originalserverip","customdata"]
separator => ","
skip_empty_columns => "true"
}
mutate {
gsub =>[
"serverip", "%.*$","",
"clientip", "%.*$","",
"originalserverip", "%.*$",""
]
}
clone {
clones => ["msgtrk_hash"]
}
}
if ([type] == "msgtrk_hash") {
fingerprint {
key => "testkey"
method => "SHA256"
source => ["serverip"]
target => "serverip"
}
fingerprint {
key => "testkey"
method => "SHA256"
source => ["clientip"]
target => "clientip"
}
fingerprint {
key => "testkey"
method => "SHA256"
source => ["originalserverip"]
target => "originalserverip"
}
}
}
output {
if ([type] == "msgtrk") {
if "_grokparsefailure" not in [tags] {
elasticsearch {
hosts => ["XXXXX:9200"]
index => "logstash-access-msgtrk-%{+YYYY.MM.dd}"
}
} else {
elasticsearch {
hosts => ["XXXXX:9200"]
index => "logstash-nomatch-msgtrk-%{+YYYY.MM.dd}"
}
}
}
if ([type] == "msgtrk_hash") {
webhdfs {
host => "XXXXXX" # (required)
port => 50070 # (optional, default: 50070)
standby_host => "XXXXXXX"
standby_port => 50070
path => "/user/pepe/logstash-%{+HH}.log" # (required)
user => "pepe" # (required)
kerberos_keytab => "/root/pepe.keytab"
use_kerberos_auth => "true"
}
}
stdout { codec => rubydebug }
}
The problem is that I can see that 'msgtrk' and 'msgtrk_hash' are as expected on the stdout but the record that I get in hdfs is not the one with hashed values but just plain fields. This is a bit unexpected as the stdout is correct.
These are the versions I am using:
./vendor/jruby/bin/gem list | egrep -e '(webhdfs|fingerprint|clone)'
logstash-filter-clone (3.0.4)
logstash-filter-fingerprint (3.1.1)
logstash-output-webhdfs (3.0.4)
webhdfs (0.8.0)
/usr/share/logstash/bin/logstash -V
logstash 5.6.4
Does someone have similar experience?
Thank you,
Ruben