Entity/Identity resolution


I'm currently working on search engines, data cleaning and so on these last
days. The challenge I'm facing right now is explaining that a search engine
on its own can not be used for identity resolution. Lucene posts made
things easier (http://wiki.apache.org/lucene-java/ScoresAsPercentages
& http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_filter_by_score.3F
). http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_filter_by_score.3F

I've been playing with Duke project also, for batch data deduplication.
It's been very powerful, and covering requirements for batch needs.

Now I'm wondering if there is not an opportunity to merge at some points
the two projets to get some fast live identity resolution service.

I'd say :

  1. duke delegates data analysis & indexing to ES (as they both rely on
    Lucene indexes)
  2. duke turns into an ES plugin to get records matching query with
    Bayesian probability as an output.

What do you guys think about it ?

Yann Barraud

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.