Return Duplicates


(cayala@courthousenews.com) #1

Hi, I researched the web and the forums for this especific use case but could not find relevant information.

Goal:

From my interface, I want to pass a date range and state code to my elastic instance to find all documents that contain similar terms.

Example:
{ID: 1, Case:1234, State:CA}{ID:7, Case:CA-1234, State:CA}

I've done something similar using a SQL query but is not efficent, I thought about doing it using Logstash but I'm requiered to show the duplicate result before deciding which document to save into a clean index.

Any advise will be much appreciated.

Regards


(Tal Levy) #2

What do you mean by "similar", is this not the same as running a query for documents that match a specific date range and have a specific State value?


(cayala@courthousenews.com) #3

Not "State" per say, but rather the "Case" mapping for example where I could have a document with 1234 and another one with CA-1234, the search should return both 1234 and CA-1234 as similar documents. Hope this makes sense.

UPDATE:

To make it clear

  • User enters date rage and selects state, clicks button to search for similar documents in CA
  • ES finds that CA-1234 and 1234 might be similar and returns both

I understand that my query has to have the mappings I want to run a search for, but in the current context I don't want the user to have to input the actual value we are looking for but rather have it submit a time frame to look for similar documents in that range, by comparing field that are suppost to be unique but that might contain variations thus creating a duplicate record with slight variations.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.