Given a new big department merged from three departments. A few employees
worked for two or three departments before merging. That means, the
attributes of one person might be listed under different departments'
One additional problem is that one person can have different first names or
These attributes of a person include
first name, last name, email, home phone, cell phone, ssn, address, etc ...
Because some values of the above could be empty, there is no unique primary
Hence, we need an intelligent solution for the classification, and to put
weights for different matching rules.
Any tips to handle such deduplication tasks? Any open-source tools
available to use?
The database contains about 100 million records.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68242e72-4aff-41a9-8a45-dc726e89aab8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.