Hi Everyone,
I am trying to build a reverse image/Image Similarity functionality using Elasticsearch. I have successfully indexed the feature vectors in Elasticsearch as an array which looks something like this:
"feature_vector" : [157, 144, 26, 107, 97, 62, 114, 248 ........ ]
The size of this array is 256.
Now I am trying to run a Euclidean Distance formula as a script.
Here's the formula I am trying to implement:
Here's the script:
GET images_features/_search
{
"sort": [
{
"_score": {
"order": "asc"
}
}
],
"query": {
"function_score": {
"script_score": {
"script": {
"lang": "painless",
"source": "double distance = 0; double diff = 0; if(doc['feature_vector'].size() != params.query_feature.size()){distance = 0} else{for(int j = 0; j < doc['feature_vector'].size(); j++){diff = Math.abs(doc['feature_vector'][j]) - Math.abs(params.query_feature[j]); distance = distance + (diff*diff)}} return Math.sqrt(distance)",
"params": {
"query_feature": [
170,
134,
191,
75,
139,
.
.
.
180,
232,
150,
182,
208,
239,
109,
232,
106
]
}
}
}
}
}
}
However I am facing an issue here because the score that's being calculated is wrong and the results returned are very vague. I got the score of 2281.6973 through running this script while the java and python programs return the score of 64.093681.
I have verified the correct value of an input feature with a feature stored in Es in both python(using the scipy library) and JAVA(by writing the same script as a java program) and both of them match.
Is there any issue in the script that I am missing out??