How to remove duplicate values in ealstic search


(narasiman) #21

i format my code using </> icon and send to again.

input {

jdbc {

jdbc_driver_library => "D:\mysql-connector-java-5.1.44\mysql-connector-java-5.1.44\mysql-connector-java-5.1.44-bin.jar"


jdbc_driver_class => "com.mysql.jdbc.Driver"

 
jdbc_connection_string => "jdbc:mysql://localhost:3306/sample"

  
jdbc_user => "root"


jdbc_password => "root"


jdbc_fetch_size => 10000


    schedule => "* * * * *"
    statement => "SELECT * from sample"

#codec => "json"

  }

}



filter {
  fingerprint {
    source => "RRH_MR_NUM"
    target => "[@metadata][fingerprint]"
    method => "MURMUR3"
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"] 
	index => "clinical" 
    # document_id => "%{[@metadata][fingerprint]}"	
	}
	stdout { codec => rubydebug }
    }

i tried this above query to eliminate the duplicate records in elastic search.

is there any mistake in my above code.

please let me know.


(Arthur Silva Sens) #22

One strategy that you could use is to query where "mrdno" is "11657", then save ONE of the ids

Then delete where "mrdno" is "11657" AND _id NOT = the id you saved

It should delete all the duplicates


(David Pilato) #23

About the format: please try to indent your code like you did for the fingerprint part.

Why this is commented?

# document_id => "%{[@metadata][fingerprint]}"

(narasiman) #24

i am not commented the below line.

how you are telling i am commented the below line.

document_id => "%{[@metadata][fingerprint]}"

you are saying # means comment?


(David Pilato) #25

Seriously... Please...


(Adebiyi Abdurrahman) #26

:rofl:


(system) #27

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.


(Alex Marquardt) #28

I have written a blog post about removing duplicate documents from Elasticsearch, which can be found at https://alexmarquardt.com/2018/07/23/deduplicating-documents-in-elasticsearch/