Duplication in logstash pipeline (input elasticsearch and output sql database)

Hi ,
I am using elasicsearch index as my input in my logstash config and the output is jdbc-output plugin logstash that send logs to sql database table columns , and the problem is I have duplication in sql database , I used uuid filter plugin logtash but nothing changed.what is the reason and what solution do you suggest?
here is my config :

    elasticsearch {
        hosts => "ip:9200"
        index => "indexname"
        user => "user"
        password => "elastic"
        query => '{ "query": { "query_string": { "query": "*" } } }'
        schedule => "*/5 * * * *"   #Specifies how often the query should be executed. In this case, it's set to run every 5 minutes
        size => 1500   #Specifies the maximum number of documents to retrieve per query
        scroll => "5m" #Specifies how long Elasticsearch should keep the search context open for the query. In this case, it's set to 5 minutes
        docinfo => true
filter {
     uuid {
        target    => "document_id"
        overwrite => true
output {
  if "API_REQUEST" in [message] {
    jdbc {
      driver_jar_path => '/usr/share/logstash/vendor/jar/jdbc/mssql-jdbc-12.2.0.jre8.jar'
      connection_string => "jdbc:sqlserver://ip:1433;databaseName=izdb;user=user;password=pass;ssl=false;trustServerCertificate=true"
      enable_event_as_json_keyword => true
      statement => [
"INSERT INTO Transaction (document_id, logLevel, timestamp) VALUES (?,?,?)",

A uuid filter is meant to generate a unique id for each event, even if the same event is processed multiple times. It is explicitly not meant to de-duplicate events.

If you want to overwrite the database row each time a duplicate event is processed then use a fingerprint filter to generate the id. (That assumes that INSERT will overwrite a document with the same primary key, and that very much depends on your what your DB allows.)

1 Like

Hi badger,
that' s exactly true. I just add docinfo_target => "[@metadata][doc]" in elasticsearch input and used [@metadata][doc][_id] instead of document_id in output and the problem solved.
thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.