Hello, this is related to the closed topic here:
Sorry to make new topic but that one is closed. I created a query based on the answer there, and it runs fine but gives only the bm25 answers, none of the vector-based returns make the cut at all. If the bm25 doesn't match it just returns nothing at all. 'cleanq' is the search text, like 'Pepsi Cola', 'question' is the field name for the text product titles, like 'Coca Cola' etc. in the index, 'question_embedding' is the dense vector, created with a HuggingFace model which works perfectly when used in just a stand-alone vector query. Adding or removing the + 1.0 seems to make no difference to this problem. The indexn works fine in stand alone, and also in the query below - but only ever gives the bm25 results. If the bm25 match is too low, the query below returns nothing, even if the same input 'cleanq' text like just 'cola' or something DOES return a vector match when used stand-alone.
Just to clarify: I don't want only the SCORES put together, I need all the results (from bm25 and from dense vector) to somehow be combined into a joint 'top n' results based on some kind of score normalization. Is this even possible to do?? Or should I be using some kind of BOOL with AND over both types of searches in a single meta-query or what?
Thank you for any assistance!
hybrid_lex_sem = self.es.search(index=indexn,
body = {
"query": {
"script_score": {
"query": {
"match": {
"question": cleanq
}
},
"script": {
"source": "_score + cosineSimilarity(params.query_vector, 'question_vector') + 1.0",
"params": {
"query_vector": question_embedding
}
}
}
}
}
)