Logstash duplication


I have created a logstash pipeline via the http_poller plugin in order to collect information from an API link.

In order to manage the duplication of documents, I used the 'fingerprint' plugin in the filter part of the logstash pipeline. This seems to work without a hitch, as I always have a single example document in the index.

However, each time the logstash pipeline is run, the documents are updated again to the date they were imported into the index.

Is it possible to ingest only those documents that are not already in the index?
Below, the piepline elements:


filter {
    mutate {
        rename => {"id" => "cve_id"}
    fingerprint {
         source => "cve_id"
         method => "SHA256"
         target => "[@metadata][fingerprint]"

output {
    elasticsearch {
		index => "opencve"
		hosts => ***
		user =>  ***
		password =>  ***
        ssl_certificate_verification => false
        document_id => "%{[@metadata][fingerprint]}"

Thanks for your help

Hello @ramiwashere,

If you want to add only new documents to the index, you can use the action "create" parameter. It'll fail if a document with the same ID already exists in the index.

output {
  elasticsearch {
    action => "create"

Hope it helps!

Hello Priscilla,

Thank you
I just update my pipeline and I will let you know if its work !


1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.