Hello friends, there are duplicate lines with empty values in the csv file. I want to delete the null fields in these records and overwrite them with other records. i managed to delete fields with null value. However, I couldn't do that. Please help me!
My csv file =>
name,surname,age,email,phone
Busra,Duygu,99,,05555555555
Busra,Duygu,,busraduygu@gmail.com,
Busra,Duygu,99,,
Busra,Duygu,,,
It means , in my csv file, the same person repeats the information more than once and some records have null values. The output I want to get :
Büşra Duygu,99,busraduygu@gmail.com,05555555555
To achieve these, I first added the csv file to the null_problem index, then I created an index called null_problem_finger to organize these duplicate documents with the fingerprint method, but I was unsuccessful.
null_problem index=>
input{
file {
path => ".../null_problem.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter{
csv{
autodetect_column_names => "true"
separator => ","
skip_header => "true"
columns => ["name","surname","age","email","phone"]
}
mutate {
remove_field =>["path", "host", "message", "@version", "@timestamp", "trade_date"]
}
ruby {
code => "
def walk_hash(parent, path, hash)
path << parent if parent
hash.each do |key, value|
walk_hash(key, path, value) if value.is_a?(Hash)
@paths << (path + [key]).map {|p| '[' + p + ']' }.join('')
end
path.pop
end
@paths = []
walk_hash(nil, [], event.to_hash)
@paths.each do |path|
value = event.get(path)
event.remove(path) if value.nil? || (value.respond_to?(:empty?) && value.empty?)
end
"
}
}
output{
elasticsearch {
hosts => "http://localhost:9200"
index => "null_problem"
document_type => "_doc"
}
stdout {}
}
null_problem_fingerprint index =>
input {
elasticsearch {
hosts => "localhost"
index => "null_problem"
query => '{ "sort": [ "_doc" ] }'
}
}
filter{
fingerprint {
method => "SHA1"
source => ["name","surname","age","email","phone"]
target => "[@metadata][generated_id]"
concatenate_sources => "true"
}
mutate {
remove_field =>["path", "host", "message", "@version", "@timestamp", "trade_date"]
}
}
output {
stdout { codec => dots }
elasticsearch {
index => "null_problem_fingerprint"
document_id => "%{[@metadata][generated_id]}"
doc_as_upsert => "true"
action => "update"
}
}
I deleted the fields with null values with the code blog in ruby, but after making the fingerprint, I still could not reach the desired output. Please help me!