Custom similarity scoring


(Matanster) #1

Hi,

I'm interested in implementing a custom version of Levenshtein Distance that accumulates edit distance in a finer-grained way than the base algorithm, namely basing the distance on specific characters being deleted/inserted and such, using a continuous value rather than accumulating distance in quanta of size one. In fact I have this code working outside of elasticsearch and the base algorithm is anyway amenable to such refinements.

By the way, this question is a continuation in part to this one, regarding coding your own similarity measures.

So Firstly, I would like to refine about why a Java plugin would be the preferred way of implementation/integration, as opposed to the script_score function which has also mentioned in the same context. Why is one recommended over the other? are there other, additional options? under what terms would a pull request for a new (and parameterizable) distance calculation option be likely to be accepted?

If indeed a Java plugin is the way to go, I would like to learn whether a Java plugin will hook into here in the source or would it be called somewhere very differently in the source? a way to hook in as a new similarity scorer, does not easily arise in the single Java plugin example included in the source.

Last but certainly not least (!) it's been implied in the older post I refer to, that writing a Java plugin is under-documented or outdated, so I wonder whether you could kindly point me at a good and current piece of documentation for it....

I Would be very thankful for getting some enablement for approaching this very efficiently

Thanks in advance!

Matan


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.