Hello, Im trying to implement a vector similarity search for face recognition using Elasticsearch in python.
This is my mapping for creating an index:
mapping = {
"mappings": {
"properties": {
"embedding_aaaa":{
"type": "dense_vector",
"dims": 128
},
"title_name": {"type": "keyword"}
}
}
}
once this index is created, i insert:
video_details = {
"filename": (unique_filename + ".mp4"),
"video_title": video_title,
"tags": tags.split(",") if tags else [],
"created_at": str(datetime.datetime.now()),
"created_by": createdBy,
"uploded_file_size_in_mb": round(os.path.getsize(file_path) / (1024 * 1024), 2)
}
This much data. once that is complete, i use a face recognition model to generate an embedding for each frame i extract from this video and append the frame's timestamp along with the embedding before finally inserting that data onto the same document by using update method.
embedding_per_frame = []
ds = {
"timestamp": timestamp,
"embedding": embedding
}
embedding_per_frame.append(ds.copy())
update_body = {
"script": {
"source": "ctx._source." + "title_vector_aaaa" + " = params." + "title_vector_aaaa",
"lang": "painless",
"params": {
"title_vector_aaa": embedding_per_frame,
}
}
}
resp = es.update(index=index_name, id=document_id, body=update_body)
Now comes the search part where i use l2 distance to find similar vectors:
for embed in embeddings:
query = {
"size": 5,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "1 / (1 + l2norm(params.queryVector, 'embedding_aaaa'))",
"params": {
"queryVector": list(embed)
}
}
}
}}
res = es.search(index="videos, body=query)
This is where the error happens.
if i replace the source statement with this:
source": "doc['embedding_aaaa'].size() == 0 ? 0 :1 / (1 + l2norm(params.queryVector, 'embedding_aaaa'))", #euclidean distance
the error goes away but all the scores are 0.0 which i assume is from the first half of the source statement.
where did i do wrong?. any help is very much appreciated