Elasticsearch cosine similarity query question

Hi there, I'm currently working on a face recognition project using Elasticsearch. I can't seem to get a working query for matching two datasets with each other. My most recent attempt is shown below the mapping. I was hoping to get some help here.

I have an index called face_recognition (mapping below). What I want to accomplish with the query is the following;

  • Filter by term: dataset = input/test.
  • For all filtered documents, find the top match of another dataset (filter the same index based on dataset = input/celebs) using the cosineSimilarity function with the face_embedding values of both documents.
  • Show the top matching document from input/celebs for each document from input/test in the response

Please let me know if you need additional information or a more detailed explanation.

Mapping

{
	"face_recognition": {
		"mappings": {
			"dynamic": "false",
			"properties": {
				"bottom_right": {
					"type": "point"
				},
				"dataset": {
					"type": "keyword"
				},
				"face_embedding": {
					"type": "dense_vector",
					"dims": 128,
					"index": true,
					"similarity": "l2_norm"
				},
				"file_name": {
					"type": "keyword"
				},
				"height": {
					"type": "integer"
				},
				"top_left": {
					"type": "point"
				},
				"width": {
					"type": "integer"
				}
			}
		}
	}
}

Non-working query

{
	"query": {
		"bool": {
			"must": [
				{
					"match": {
						"dataset": "input/test"
					}
				}
			],
			"should": [
				{
					"script_score": {
						"query": {
							"match": {
								"dataset": "input/celebs"
							}
						},
						"script": {
							"source": "cosineSimilarity(params.query_vector, doc['face_embedding']) + 1.0",
							"params": {
								"query_vector": {
									"dataset": "input/test",
									"face_embedding": {
										"type": "dense_vector",
										"dims": 128,
										"index": true,
										"similarity": "l2_norm"
									}
								}
							}
						}
					}
				}
			]
		}
	},
	"sort": [
		{
			"_score": {
				"order": "desc"
			}
		}
	]
}

ES instance info

{
	"name": "8ebe152fb75a",
	"cluster_name": "docker-cluster",
	"cluster_uuid": "ufSwSRHlQt2HHUtZccdhfw",
	"version": {
		"number": "8.5.3",
		"build_flavor": "default",
		"build_type": "docker",
		"build_hash": "4ed5ee9afac63de92ec98f404ccbed7d3ba9584e",
		"build_date": "2022-12-05T18:22:22.226119656Z",
		"build_snapshot": false,
		"lucene_version": "9.4.2",
		"minimum_wire_compatibility_version": "7.17.0",
		"minimum_index_compatibility_version": "7.0.0"
	},
	"tagline": "You Know, for Search"
}

Hey @ward ,

Your query doesn't really make sense to me. So I need some clarification.

Does the field dataset ever match both input/test AND input/celebs? Because that is what your query requires to find score any documents according to cosine.

Its saying, when field has get me individual documents where: dataset is input/test then if dataset is input/celebs as well, score the documents according to the cosineSimilarity of the query_vector and the stored vector.

Hey @BenTrent, thanks for your response.

No, the dataset field is a keyword field containing only e.g. input/test OR input/celebs OR input/whatever. Not multiple values, just a string with said values.

That's interesting since I acutally get some results from the query (see below).

The way it should work is that it will get all the documents where dataset equals to input/test, for each of them find the most similar face_embedding document from documents where dataset equals to input/celebs by using the cosineSimilarity function (or something else perhaps?).

These matching documents should, if possible, be added to the response for every hit (the input/test documents).

Thanks!

{
	"took": 6,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 12,
			"relation": "eq"
		},
		"max_score": 11.344393,
		"hits": [
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-152",
				"_score": 11.344393,
				"_source": {
					"bottom_right": [
						942.0,
						524.0
					],
					"dataset": "input/test",
					"face_embedding": [
						-0.04792561009526253,
						...
						0.07211816310882568
					],
					"file_name": "teamfoto.jpg",
					"height": "982",
					"top_left": [
						852.0,
						435.0
					],
					"width": "1473"
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-153",
				"_score": 11.344393,
				"_source": {
					"bottom_right": [
						1324.0,
						528.0
					],
					"dataset": "input/test",
					"face_embedding": [
						0.0036550709046423435,
						...
						0.07274394482374191
					],
					"file_name": "teamfoto.jpg",
					"height": "982",
					"top_left": [
						1249.0,
						453.0
					],
					"width": "1473"
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-154",
				"_score": 11.344393,
				"_source": {
					"bottom_right": [
						1091.0,
						524.0
					],
					"dataset": "input/test",
					"face_embedding": [
						-0.0691288411617279,
						...
						0.00859785545617342
					],
					"file_name": "teamfoto.jpg",
					"height": "982",
					"top_left": [
						1001.0,
						435.0
					],
					"width": "1473"
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-155",
				"_score": 11.344393,
				"_source": {
					"bottom_right": [
						205.0,
						474.0
					],
					"dataset": "input/test",
					"face_embedding": [
						-0.05700485780835152,
						...
						0.05050960183143616
					],
					"file_name": "teamfoto.jpg",
					"height": "982",
					"top_left": [
						116.0,
						385.0
					],
					"width": "1473"
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-156",
				"_score": 11.344393,
				"_source": {
					"bottom_right": [
						683.0,
						514.0
					],
					"dataset": "input/test",
					"face_embedding": [
						-0.14340144395828247,
						...
						0.09600429981946945
					],
					"file_name": "teamfoto.jpg",
					"height": "982",
					"top_left": [
						593.0,
						425.0
					],
					"width": "1473"
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-157",
				"_score": 11.344393,
				"_source": {
					"bottom_right": [
						474.0,
						375.0
					],
					"dataset": "input/test",
					"face_embedding": [
						0.012161782942712307,
						...
						-0.001084950752556324
					],
					"file_name": "teamfoto.jpg",
					"height": "982",
					"top_left": [
						384.0,
						285.0
					],
					"width": "1473"
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-0",
				"_score": 11.344393,
				"_source": {
					"dataset": "input/test",
					"file_name": "teamfoto.jpg",
					"width": 1473,
					"height": 982,
					"top_left": [
						852,
						435
					],
					"bottom_right": [
						942,
						524
					],
					"face_embedding": [
						-0.04792561009526253,
						...
						0.07211816310882568
					]
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-1",
				"_score": 11.344393,
				"_source": {
					"dataset": "input/test",
					"file_name": "teamfoto.jpg",
					"width": 1473,
					"height": 982,
					"top_left": [
						1249,
						453
					],
					"bottom_right": [
						1324,
						528
					],
					"face_embedding": [
						0.0036550709046423435,
						...
						0.07274394482374191
					]
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-2",
				"_score": 11.344393,
				"_source": {
					"dataset": "input/test",
					"file_name": "teamfoto.jpg",
					"width": 1473,
					"height": 982,
					"top_left": [
						1001,
						435
					],
					"bottom_right": [
						1091,
						524
					],
					"face_embedding": [
						-0.0691288411617279,
						...
						0.00859785545617342
					]
				}
			},
			{
				"_index": "face_recognition",
				"_id": "input/test/teamfoto.jpg-3",
				"_score": 11.344393,
				"_source": {
					"dataset": "input/test",
					"file_name": "teamfoto.jpg",
					"width": 1473,
					"height": 982,
					"top_left": [
						116,
						385
					],
					"bottom_right": [
						205,
						474
					],
					"face_embedding": [
						-0.05700485780835152,
						...
						0.05050960183143616
					]
				}
			}
		]
	}
}

@ward

Every document is scored individually. What you want seems like a two phase query, one to get the embeddings, then score those embeddings.

Additionally, the query as written is never executing the cosine similarity. It is only returning documents that match your must clause for input/test.

Cosine similarity never runs because for it to run, the field dataset needs to both be input/test and input/celebs

Have you read the docs on cosine similarity? It should provide some nice context around how to actually run a script scorer. The format you are using is incorrect.

What you want to do, which is query the data set again given the results of input/test, requires more than one query.

  • One to get the vectors to score
  • Then multiple individual queries for each vector you want to search again against the input/celebs