Custom similarity scoring

Hi,

I'm interested in implementing a custom version of Levenshtein Distance that accumulates edit distance in a finer-grained way than the base algorithm, namely basing the distance on specific characters being deleted/inserted and such, using a continuous value rather than accumulating distance in quanta of size one. In fact I have this code working outside of elasticsearch and the base algorithm is anyway amenable to such refinements.

By the way, this question is a continuation in part to this one, regarding coding your own similarity measures.

So Firstly, I would like to refine about why a Java plugin would be the preferred way of implementation/integration, as opposed to the script_score function which has also mentioned in the same context. Why is one recommended over the other? are there other, additional options? under what terms would a pull request for a new (and parameterizable) distance calculation option be likely to be accepted?

If indeed a Java plugin is the way to go, I would like to learn whether a Java plugin will hook into here in the source or would it be called somewhere very differently in the source? a way to hook in as a new similarity scorer, does not easily arise in the single Java plugin example included in the source.

Last but certainly not least (!) it's been implied in the older post I refer to, that writing a Java plugin is under-documented or outdated, so I wonder whether you could kindly point me at a good and current piece of documentation for it....

I Would be very thankful for getting some enablement for approaching this very efficiently

Thanks in advance!

Matan

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.