I have a question about Elasticsearch
. Namely, I have some data about embedding vectors (dense vector) and their corresponding string tokens from a algorithm using K-Means to map them from high-dimensionality vector space into smaller subspace (text format) for full-text search engine Elasticsearch to fast query (Similarity searching).
And then I will get the results from Elasticsearch query phase to rescore (or rerank) it with Euclidean distance.
But this rescoring phase seems not working, results after rescoring lose similarities from query phase.
Here is my request body (json) for query
and rescore
with Elasticsearh:
request_body_1 = {
"size": s,
"query": {
"function_score": {
"functions": string_tokens_body,
"score_mode": "sum",
"boost_mode": "replace"
}
},
"rescore": {
"window_size": r, # Get top-r results from query phase for rescoring with Eucliean distance.
"query": {
"rescore_query": {
"function_score": {
"script_score": {
"script": {
"lang": "painless",
"source": """
def sum = 0.0 ;
for (def index = 0; index < params['_source']['embedding_vector'].length; index++) {
sum += Math.pow(params.query_vector[index] - doc['embedding_vector'][index], 2);
}
return(Math.sqrt(sum));
""",
"params": {
"query_vector": query_vector.tolist() # numpy array not working here.
}
}
},
"boost_mode": "replace"
}
},
"query_weight": 0, # Remove scores from query phase.
"rescore_query_weight": 1 # Just calculate scores according to *rescoring phase*.
}
}
}
Here is an example my document for indexing to Elasticsearch:
{
"index": "my_project",
"type": "_doc",
"id": 1,
"source": {
"embedding_vector": [1.12, 2.24, 3,34, 4,45],
"other_field": "other_datatypes"
}
}
How can I solve this problem ?
Thanks in advance for any reply of you.