Is there a way to combine BM25 lexical search score with dense vector score to interpolate them together?

Scott_M · November 17, 2022, 9:54pm

Hello, this is related to the closed topic here:

Sorry to make new topic but that one is closed. I created a query based on the answer there, and it runs fine but gives only the bm25 answers, none of the vector-based returns make the cut at all. If the bm25 doesn't match it just returns nothing at all. 'cleanq' is the search text, like 'Pepsi Cola', 'question' is the field name for the text product titles, like 'Coca Cola' etc. in the index, 'question_embedding' is the dense vector, created with a HuggingFace model which works perfectly when used in just a stand-alone vector query. Adding or removing the + 1.0 seems to make no difference to this problem. The indexn works fine in stand alone, and also in the query below - but only ever gives the bm25 results. If the bm25 match is too low, the query below returns nothing, even if the same input 'cleanq' text like just 'cola' or something DOES return a vector match when used stand-alone.

Just to clarify: I don't want only the SCORES put together, I need all the results (from bm25 and from dense vector) to somehow be combined into a joint 'top n' results based on some kind of score normalization. Is this even possible to do?? Or should I be using some kind of BOOL with AND over both types of searches in a single meta-query or what?

Thank you for any assistance!

      hybrid_lex_sem = self.es.search(index=indexn,
        body = {
        "query": {
          "script_score": {
            "query": {
              "match": {
                "question": cleanq
              }
            },
            "script": {
              "source": "_score + cosineSimilarity(params.query_vector, 'question_vector') + 1.0",
              "params": {
                "query_vector": question_embedding
              }
            }
          }
        }
        }
      )

system · December 15, 2022, 9:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is there a way to combine default BM25 score of Elasticsearch and Dense Vectors similarity Elasticsearch	3	2571	April 23, 2020
How to combine default BM25 score of Elasticsearch and Dense Vectors similarity Elasticsearch	3	589	May 7, 2021
Combine knn score and match score Elasticsearch	3	1559	June 22, 2022
Use distance on dense vectors in relevance score (at query time) Elasticsearch	3	2083	March 3, 2020
Do we something similar to opensearch's "Normalization processor" in elasticsearch? Elasticsearch	2	224	May 27, 2024

Is there a way to combine BM25 lexical search score with dense vector score to interpolate them together?

Related topics