I would like to know How do I use Elastic Search to implement Probabilistic Record Linkage across multiple data set?
Do we I need to pre-process the data from multiple source first then push the data into ES or Shall I load the data directly push the data into ES?
I took the source code and tried to build/compile it just upgrading the ES version. If I use mvn clean install -DskipTests=true it compiles the code and generates the jar however if I use it only mvn clean install it fails it throws up exception NoClassDef from test class
Thank you. Already did that. it works now. Now the question is, How should I approach the problem statement in ES? I mean I have two datasets I have to perform the record Linking across dataset. Do I need to index both of the dataset? I am confused with how does this entity-resolution plug-in works with multiple datasets?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.