Custom similarity scoring

matanster · June 8, 2018, 2:18pm

Hi,

I'm interested in implementing a custom version of Levenshtein Distance that accumulates edit distance in a finer-grained way than the base algorithm, namely basing the distance on specific characters being deleted/inserted and such, using a continuous value rather than accumulating distance in quanta of size one. In fact I have this code working outside of elasticsearch and the base algorithm is anyway amenable to such refinements.

By the way, this question is a continuation in part to this one, regarding coding your own similarity measures.

So Firstly, I would like to refine about why a Java plugin would be the preferred way of implementation/integration, as opposed to the script_score function which has also mentioned in the same context. Why is one recommended over the other? are there other, additional options? under what terms would a pull request for a new (and parameterizable) distance calculation option be likely to be accepted?

If indeed a Java plugin is the way to go, I would like to learn whether a Java plugin will hook into here in the source or would it be called somewhere very differently in the source? a way to hook in as a new similarity scorer, does not easily arise in the single Java plugin example included in the source.

Last but certainly not least (!) it's been implied in the older post I refer to, that writing a Java plugin is under-documented or outdated, so I wonder whether you could kindly point me at a good and current piece of documentation for it....

I Would be very thankful for getting some enablement for approaching this very efficiently

Thanks in advance!

Matan

system · July 6, 2018, 2:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom similarity in 6.3? Elasticsearch	8	584	September 17, 2018
Custom score with distance levenstein Elasticsearch	1	434	July 6, 2017
Custom Scorer and Custom Analyzer Elasticsearch	9	452	July 6, 2017
Custom score for fuzzy matching based on Levenshtein distance score Elasticsearch	6	11795	April 26, 2018
Score based on Levenshtein distance in the results Elasticsearch	1	917	June 30, 2017

Custom similarity scoring

Related topics