Logstash - how to overwrite document instead of creating new ones

Hi Guys,

I need help. I have got db2 logstash pipeline which runs a query at intervals. The problem is everytime it runs the query, it creates a new document with same data which fills up my index with duplicate data. How can i overwrite all document_id everytime my query runs ?

1 Like

You need to generate your own document_id and elasticsearch will use document_id to update document.

output {
    elasticsearch {
        host => 'your es host'
        action => 'update'
        document_id => 'your generated document id'
        index => 'your index name
    }
}

For your situation, fill document_id with a id field from db2 can overwrite document.

output {
    elasticsearch {
        host => 'your es host'
        action => 'update'
        document_id => "%{[id]}"
        index => 'your index name
    }
}
1 Like

Thanks but below ia error i get with the proposed settings:

Here are the settings im using:

filter {
fingerprint {
method => "UUID"
target => "[@metadata][uuid]"
}

mutate {
add_field => { "uuid" => "%{[@metadata][uuid]}" }
}

}
output {
elasticsearch {
hosts => ["localhost:9200"]
action => "update"
document_id => "%{uuid}"
index => "storageg-monitoring-%{+YYYY.MM.dd}"
doc_as_upsert => false
}
}

=================================================
Error:
[2019-07-20T23:52:08,097][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>404, :action=>["update", {:_id=>"2c18be01-d143-4511-9c54-208c4f57b897", :_index=>"storageg-monitoring-2019.07.20", :_type=>"_doc", :routing=>nil, :retry_on_conflict=>1}, #LogStash::Event:0x5f69d037], :response=>{"update"=>{"_index"=>"storageg-monitoring-2019.07.20", "_type"=>"_doc", "_id"=>"2c18be01-d143-4511-9c54-208c4f57b897", "status"=>404, "error"=>{"type"=>"document_missing_exception", "reason"=>"[_doc][2c18be01-d143-4511-9c54-208c4f57b897]: document missing", "index_uuid"=>"PHkRN-fTQa2MRoOzn5P5-g", "shard"=>"0", "index"=>"storageg-monitoring-2019.07.20"}}}}

I believe that is telling you that the document you are trying to update does not exist, so you are actually trying to create a new document. If that is what you want to do set doc_as_upsert to be true.

Use this filter to generate a unique id for your event based on fields that define the unicity of the document

fingerprint {
	method => "MD5"
	key => "xxxxxxxxx"
	concatenate_sources => true
	concatenate_all_fields => false
	source => ["field1", ""field2"]
	target => "[@metadata][document_id]"
}


output {
	elasticsearch {
	hosts       => ["http://localhost:9200"]
	user        => "xxx"
	password    => "xxxx"
	index       => "index-name%{+YYYY-MM-dd}"
	document_id => "%{[@metadata][document_id]}"
}
}

Thanks so much guys!
It works well with your proposed settings :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.