Averaging dense_vectors within ES

fpug · November 17, 2020, 8:27am

Hi!

I'm designing a clustering algorithm, which leverages ES as a backend to retrieve close matches, using the recently introduced dense_vector type and the cosineSimilarity function. So far so good, aside from the fact that some approximate-NN techniques baked-in would improve performance.

But let's get to the point: at some point I need to calculate the centroids of the resulting clusters, by averaging the dense vectors of the documents belonging to each cluster. It can be done outside ES of course, but that involves a non-negligible amount of I/O.

So my question is: can I leverage any scripting functionality within ES to efficiently calculate the average of dense_vectors? Maybe grouping the documents by their cluster_id with an aggregation?

Thanks in advance, and have a great day

system · December 15, 2020, 8:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting aggregate statistics about a dense_vector field Elasticsearch	5	1369	November 4, 2022
Aggregation of nearby dense vectors Elasticsearch	1	1027	August 19, 2020
Is there any way we can use list of vectors to store in ElasticSearch and what are the corresponding changes required in ES query for calculating cosine similarity Elasticsearch	2	355	June 28, 2021
Searching with Dense Vector Elasticsearch	4	3416	January 31, 2020
More search time Elasticsearch	10	553	June 18, 2020

Averaging dense_vectors within ES

Related topics