Similarity score in array

Ban_Mido · September 22, 2013, 5:56pm

Hi ,

I have a field called tags which is an array of elements and i have applied
a flat analyzer over it.
Now i want a array of tags and i want to see which feed matches the array
of tags the most. The logic to similarity is that

The feed with maximum number of matching tags should come first.
In case if 2 feeds have same amount of matched tag , then the feed
with the highest percentage of matched tag should come.

In short if my query is "terms" : { "tags" : [ "one" , "two" ] } ,
then the feed { "tags" : [ "one" , "two" , "three" , "four"]} should have
greater similarity score over the feed { "tags" : [ "one" , "two" , "three"
, "four" , "five" ] because for the latter feed the percentage of matched
tags is 75% and for former percentage of matched tags is 60%.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · September 24, 2013, 2:56am

Hi,
Doing a stock sort on score will get you most of the way there. However,
it will not strictly adhere to the first rule since it is TF/IDF based.

Implementing a custom score would definitely work:

Getting more out there, but likely more optimal, you could perhaps define
your own similarity model. Something to play around with at least:

Best Regards,
Paul

On Sunday, September 22, 2013 1:56:35 PM UTC-4, Ban Mido wrote:

Hi ,

I have a field called tags which is an array of elements and i have
applied a flat analyzer over it.
Now i want a array of tags and i want to see which feed matches the array
of tags the most. The logic to similarity is that

The feed with maximum number of matching tags should come first.

In case if 2 feeds have same amount of matched tag , then the feed
with the highest percentage of matched tag should come.

In short if my query is "terms" : { "tags" : [ "one" , "two" ] } ,
then the feed { "tags" : [ "one" , "two" , "three" , "four"]} should have
greater similarity score over the feed { "tags" : [ "one" , "two" , "three"
, "four" , "five" ] because for the latter feed the percentage of matched
tags is 75% and for former percentage of matched tags is 60%.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Longest match in string/array Elasticsearch	2	1451	September 20, 2017
Need help for Custom Score On Array Fields Elasticsearch	1	313	July 6, 2017
Elasticsearch - How to find similarity between 2 arrays? Elasticsearch	5	941	April 5, 2019
Need help on similarity ranking approach Elasticsearch	9	516	July 6, 2017
Custom document scorer Elasticsearch	4	260	July 6, 2017

Similarity score in array

Related topics