Hi!
I'm designing a clustering algorithm, which leverages ES as a backend to retrieve close matches, using the recently introduced dense_vector type and the cosineSimilarity function. So far so good, aside from the fact that some approximate-NN techniques baked-in would improve performance.
But let's get to the point: at some point I need to calculate the centroids of the resulting clusters, by averaging the dense vectors of the documents belonging to each cluster. It can be done outside ES of course, but that involves a non-negligible amount of I/O.
So my question is: can I leverage any scripting functionality within ES to efficiently calculate the average of dense_vectors? Maybe grouping the documents by their cluster_id with an aggregation?
Thanks in advance, and have a great day