Greetings : I have Above than 1M records in ES. Now i want to dedup data on bases of percentage. For Example "Give me list of all records whose title are 90% matched or above".
Lets take another example. "I need to retrieve all records whose locations are 80% or above matched".
I try to dedup records according to title but i need to retrieve by percentage.
For example i have title "Greenland swimming Pools Georgia" ...
Now 100% match is definitely we all know . if i have a record having title "swimming pools georgia" then i think its similar to previous one approx 70% . I just want to dig out the algorithm who can help me to fetch these similar records with scoring .
Is any way to find similar records like this in ES ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.