Webhdfs and clone plugin

Hi,
As part of a data protection project, I need to given a certain document, hash some of its values like IP, username etc. In order to do so I use following logstash file:

input {
   stdin {
            add_field => {"type" => "msgtrk"}
            add_field => {"logstash_source" => "XXXXXX"}
    }
}

filter {
  if ([type] == "msgtrk") {
 csv {
    columns => ["timestamp","clientip","clienthostname","serverip","serverhostname","sourcecontext","connectorid","sourcemsg","eventid","internalmessageid","messageid","recipientaddress","recipientstatus","totalbytes","recipientcount","relatedrecipientaddress","reference","messagesubject","senderaddress","returnpath","messageinfo","directionality","tenantid","originalclientip","originalserverip","customdata"]
    separator => ","
    skip_empty_columns => "true"
 }
 mutate {
    gsub =>[
      "serverip", "%.*$","",
      "clientip", "%.*$","",
      "originalserverip", "%.*$",""
           ]
 }
 clone {
   clones => ["msgtrk_hash"]
  }
 }
 if ([type] == "msgtrk_hash") {
 fingerprint {
   key => "testkey"
   method => "SHA256"
   source => ["serverip"]
   target => "serverip"
 }
 fingerprint {
   key => "testkey"
   method => "SHA256"
   source => ["clientip"]
   target => "clientip"
 }
 fingerprint {
   key => "testkey"
   method => "SHA256"
   source => ["originalserverip"]
   target => "originalserverip"
 }

  }
}

output {
  if ([type] == "msgtrk") {
if "_grokparsefailure" not in [tags] {
    elasticsearch {
      hosts           => ["XXXXX:9200"]
      index           => "logstash-access-msgtrk-%{+YYYY.MM.dd}"
    }
} else {
    elasticsearch {
      hosts           => ["XXXXX:9200"]
      index           => "logstash-nomatch-msgtrk-%{+YYYY.MM.dd}"
    }
}
  }
  if ([type] == "msgtrk_hash") {
webhdfs {
  host => "XXXXXX"                 # (required)
  port => 50070                       # (optional, default: 50070)
  standby_host => "XXXXXXX"
  standby_port => 50070
  path => "/user/pepe/logstash-%{+HH}.log"  # (required)
  user => "pepe"                       # (required)
  kerberos_keytab => "/root/pepe.keytab"
  use_kerberos_auth => "true"
}
  }
  stdout { codec => rubydebug }
}

The problem is that I can see that 'msgtrk' and 'msgtrk_hash' are as expected on the stdout but the record that I get in hdfs is not the one with hashed values but just plain fields. This is a bit unexpected as the stdout is correct.

These are the versions I am using:

./vendor/jruby/bin/gem list | egrep -e '(webhdfs|fingerprint|clone)'
logstash-filter-clone (3.0.4)
logstash-filter-fingerprint (3.1.1)
logstash-output-webhdfs (3.0.4)
webhdfs (0.8.0)
/usr/share/logstash/bin/logstash -V
logstash 5.6.4

Does someone have similar experience?
Thank you,
Ruben

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.