Custom function for Text Similarity Search

Loreto_Parisi · December 2, 2019, 10:39am

I'm using a dense_vector and cosineSimilarity to get documents similarity following the good tutorial here

script_query = {
        "script_score": {
            "query": {"match_all": {}},
            "script": {
                "source": "cosineSimilarity(params.query_vector, doc['text_vector']) + 1.0",
                "params": {"query_vector": query_vector}
            }
        }
    }

As for release 7.3 elasticsearch provide natively for the script the cosineSimilarity. When working in the STS (Sentence Textual Similarity), there are more choices for the metrics, among them:

Euclidean distance
Manhattan distance
Cosine distance (equivalente alla Euclidean distance dei vettori normalizzati)
Hamming distance
Dot (Inner) Product distance

So, how to implement a custom ElasticSearch similarity function for the search query script, let's say Euclidean or dot product?

Thank you.

NOTE

My reference project was BertSearch, were the textual embedding has been calculated with Google's BERT.
We should keep in mind that for a given vectorial distance to get the similarity a common transformation is:similarity = 1 / (1 + distance).
Regarding BERT, a good similarity scoring approach is described in bert_score

mayya · December 2, 2019, 8:49pm

Hello!
From 7.3 we have the following vector functions available: cosineSimilarity and dotProduct.

From 7.4 two more functions added: l1norm (manhattan distance) and l2norm (euclidean distance).

We are still investigating the need for bit vectors and hamming distance.

how to implement a custom Elasticsearch similarity function for the search query script, let's say Euclidean or dot product?

They are already implemented from 7.3 (dotProduct) and 7.4 (euclidean distance).
There is no a straightforward approach to implement custom distance functions, as this would require the development of plugins. If you think some function is widely used and not implemented yet, please open an issue in the elasticsearch github and we will discuss it.

We should keep in mind that for a given vectorial distance to get the similarity a common transformation is: similarity = 1 / (1 + distance) .

In script_score query, you can do any transformation with the calculated distance including the one you needed.

system · December 30, 2019, 8:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Use distance on dense vectors in relevance score (at query time) Elasticsearch	3	2083	March 3, 2020
Using cosineSimilarity function inside aggregation scripts Elasticsearch	3	618	August 9, 2022
Script_score query with cosineSimularity on alias Elasticsearch	1	201	January 17, 2023
ScriptEngine - ScoreScript : cosine similarity Elasticsearch	2	1052	January 24, 2019
Vector-Based search using cosineSimilarity Elasticsearch	4	323	August 11, 2022

Custom function for Text Similarity Search

Related topics