Hi,
I'm currently working on search engines, data cleaning and so on these last
days. The challenge I'm facing right now is explaining that a search engine
((ie. ElasticSearch http://www.elasticsearch.orgin this case) on its own
can not be used for identity resolution. Lucene posts made things easier (
http://wiki.apache.org/lucene-java/ScoresAsPercentages &
http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_filter_by_score.3F
). http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_filter_by_score.3F
I've been playing with Duke http://code.google.com/p/dukeproject also,
for batch data deduplication. It's been very powerful, and covering
requirements for batch needs.
Now I'm wondering if there is not an opportunity to merge at some point the
two projets to get some fast live identity resolution service.
I'd say :
- duke delegates data analysis & indexing to ElasticSearch
http://www.elasticsearch.org(as they both rely on Lucene indexes) - duke http://code.google.com/p/duketurns into an ES plugin to get
records matching query with Bayesian probability as an output.
What do you guys think about it ?
Regards,
Yann Barraud
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.