Update existing document

I am facing the following issue with Elasticsearch and logstash.
every day logstash create index with current date like : mydata_2022.05.12
i have a index patterns name: mydata_*
i get documents from api and index them using logstash so if a document is indexed 2022.05.11
and updated in 2022.05.12 it will be duplicate.
there is any way to update or delete the first one automatically.
I'm pretty new to logstash, so any help is much appreciated :).
Here is my logstash conf.

input {
    file {
        path => "/etc/logstash/conf.d/documents/*.json"
        mode => "read"
        start_position => "beginning"
        sincedb_path => "NUL"
        codec => multiline {
            negate => true
            what => "previous"
            pattern => '^\{'
            max_lines => 10000000
        }
        type => "json"
        file_completed_action => "log_and_delete"
        file_completed_log_path => "/etc/logstash/conf.d/documents/files.log"
    }
}

filter {
    json {
        source => message
    }

}

output {
    stdout {
        codec => rubydebug {
            metadata => false
        }
    }
   
    
        elasticsearch {
            hosts => ["http://localhost:9200"]
            index => "mydata_%{+yyyy.MM.dd}"
            document_id => "%{[data][uuid]}"
        }
    
}
  1. When you set to null, this means LS will lose file list which had been read.
    sincedb_path => "/dev/null" - Linux
    You can also try to use json_lines codec for multiline JSON files.
  2. Duplication. Your source must have unique value to recognize update.
    ES logic is quite good, unique document_id does not exist -> insert; unique document_id does exist -> update. Usually, for doc_id is used temporary field @metadata.
    elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "mydata_%{+yyyy.MM.dd}"
    document_id => "%{[@metadata][uuid]}"
    }

If you have fields: document_id => "%{[@metadata][uuid]}"
uuid field1 field2 field3:
1 a b c -> insert
3 e d f -> insert
1 x y z -> update
If you don't have a unique value per line, you might try to use fingerprint plugin:

fingerprint {
  source => ["user_id", "siblings", "birthday"]
}

Got it, Thank you @Rios

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.