How to remove duplicate values in ealstic search

i format my code using </> icon and send to again.

input {

jdbc {

jdbc_driver_library => "D:\mysql-connector-java-5.1.44\mysql-connector-java-5.1.44\mysql-connector-java-5.1.44-bin.jar"


jdbc_driver_class => "com.mysql.jdbc.Driver"

 
jdbc_connection_string => "jdbc:mysql://localhost:3306/sample"

  
jdbc_user => "root"


jdbc_password => "root"


jdbc_fetch_size => 10000


    schedule => "* * * * *"
    statement => "SELECT * from sample"

#codec => "json"

  }

}



filter {
  fingerprint {
    source => "RRH_MR_NUM"
    target => "[@metadata][fingerprint]"
    method => "MURMUR3"
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"] 
	index => "clinical" 
    # document_id => "%{[@metadata][fingerprint]}"	
	}
	stdout { codec => rubydebug }
    }

i tried this above query to eliminate the duplicate records in elastic search.

is there any mistake in my above code.

please let me know.

One strategy that you could use is to query where "mrdno" is "11657", then save ONE of the ids

Then delete where "mrdno" is "11657" AND _id NOT = the id you saved

It should delete all the duplicates

1 Like

About the format: please try to indent your code like you did for the fingerprint part.

Why this is commented?

# document_id => "%{[@metadata][fingerprint]}"

i am not commented the below line.

how you are telling i am commented the below line.

document_id => "%{[@metadata][fingerprint]}"

you are saying # means comment?

Seriously... Please...

1 Like

:rofl:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

I have written a blog post about removing duplicate documents from Elasticsearch, which can be found at https://alexmarquardt.com/2018/07/23/deduplicating-documents-in-elasticsearch/

1 Like