I've done something similar using a SQL query but is not efficent, I thought about doing it using Logstash but I'm requiered to show the duplicate result before deciding which document to save into a clean index.
Not "State" per say, but rather the "Case" mapping for example where I could have a document with 1234 and another one with CA-1234, the search should return both 1234 and CA-1234 as similar documents. Hope this makes sense.
UPDATE:
To make it clear
User enters date rage and selects state, clicks button to search for similar documents in CA
ES finds that CA-1234 and 1234 might be similar and returns both
I understand that my query has to have the mappings I want to run a search for, but in the current context I don't want the user to have to input the actual value we are looking for but rather have it submit a time frame to look for similar documents in that range, by comparing field that are suppost to be unique but that might contain variations thus creating a duplicate record with slight variations.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.