Compare each element to every other element and calculate match score


(Stephen Coffman) #1

I'm not sure what feature of ElasticSearch to use for this, Analyzers? Aggregations? Something else?

I need to compare each indexed element with every other indexed element and calculate a score for how well they match each other. The calculation is based on the values of the terms in each indexed element. I can do the calculation part. I just don't know how to do the "looping" part.

I apologize if this has already been answered - I couldn't find what I was looking for.

Thank you.


(Christoph) #2

Hi,

since this is not a standard search use case (and neither something that is usually associated when doing something like aggregations) there is nothing available out of the box. You can, however, write the looping part yourself and execute it from a client and use the Term Vectors API to access the terms information.
Since it looks like your result will be something like a symmetric matrix you can get away with only having to do this n^2/2 (n being the number of elements in your index), Still quiet a lot of work if your index is large. I'd rather get the term vectors for each document once and do the whole calculation "offline" later.


(Stephen Coffman) #3

Thank you Christoph!


#4

I think that this is something that can be done via Scripted Metric Aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html

(in your example, you could let the aggregation return an array of scores and some sort of index so you can correlate the score to the elenent)

However, in the example Groovy is being used, whereas from ES 5.0 and on Painless should be used, so the documention is outdated.
I tried it myself but could not work out the "_agg" variable, so we have to wait for an updated example I think...


(Stephen Coffman) #5

That looks very interesting. I will investigate that. Thanks!


(system) #6