How to remove duplicate document and field with null value

Hello friends, there are duplicate lines with empty values in the csv file. I want to delete the null fields in these records and overwrite them with other records. i managed to delete fields with null value. However, I couldn't do that. Please help me!

My csv file =>
name,surname,age,email,phone
Busra,Duygu,99,,05555555555
Busra,Duygu,,busraduygu@gmail.com,
Busra,Duygu,99,,
Busra,Duygu,,,

It means , in my csv file, the same person repeats the information more than once and some records have null values. The output I want to get :
Büşra Duygu,99,busraduygu@gmail.com,05555555555

To achieve these, I first added the csv file to the null_problem index, then I created an index called null_problem_finger to organize these duplicate documents with the fingerprint method, but I was unsuccessful.

null_problem index=>

input{
    file { 
      path => ".../null_problem.csv"
      start_position => "beginning"
      sincedb_path => "NUL" 
    }
}
filter{
    csv{        
        autodetect_column_names => "true"
        separator => ","
        skip_header => "true"
        columns => ["name","surname","age","email","phone"]
    }
    mutate { 
        remove_field =>["path", "host", "message", "@version", "@timestamp", "trade_date"]
    }
    ruby {
        code => "
            def walk_hash(parent, path, hash)
                path << parent if parent
                hash.each do |key, value|
                walk_hash(key, path, value) if value.is_a?(Hash)
                @paths << (path + [key]).map {|p| '[' + p + ']' }.join('')
                end
                path.pop
            end
            @paths = []
            walk_hash(nil, [], event.to_hash)
            @paths.each do |path|
                value = event.get(path)
                event.remove(path) if value.nil? || (value.respond_to?(:empty?) && value.empty?)
            end
            "
    }
}
output{
    elasticsearch { 
        hosts => "http://localhost:9200"
        index => "null_problem"
        document_type => "_doc"
    }
    stdout {}
}

null_problem_fingerprint index =>

input {
  elasticsearch {
    hosts => "localhost"
    index => "null_problem"
    query => '{ "sort": [ "_doc" ] }'
  }
}
filter{  
    fingerprint {
    method => "SHA1"
    source => ["name","surname","age","email","phone"]
    target => "[@metadata][generated_id]"
    concatenate_sources => "true"   
  }
  mutate { 
        remove_field =>["path", "host", "message", "@version", "@timestamp", "trade_date"]
  }
}
output {
    stdout { codec => dots }
    elasticsearch {
        index => "null_problem_fingerprint"
        document_id => "%{[@metadata][generated_id]}"
        doc_as_upsert => "true"
        action => "update"
    }
}

I deleted the fields with null values with the code blog in ruby, but after making the fingerprint, I still could not reach the desired output. Please help me!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.