Similarity score in array


(Ban Mido) #1

Hi ,

I have a field called tags which is an array of elements and i have applied
a flat analyzer over it.
Now i want a array of tags and i want to see which feed matches the array
of tags the most. The logic to similarity is that

  1. The feed with maximum number of matching tags should come first.
  2. In case if 2 feeds have same amount of matched tag , then the feed
    with the highest percentage of matched tag should come.

In short if my query is "terms" : { "tags" : [ "one" , "two" ] } ,
then the feed { "tags" : [ "one" , "two" , "three" , "four"]} should have
greater similarity score over the feed { "tags" : [ "one" , "two" , "three"
, "four" , "five" ] because for the latter feed the percentage of matched
tags is 75% and for former percentage of matched tags is 60%.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Longest match in string/array
(ppearcy) #2

Hi,
Doing a stock sort on score will get you most of the way there. However,
it will not strictly adhere to the first rule since it is TF/IDF based.

Implementing a custom score would definitely work:
http://www.elasticsearch.org/guide/reference/query-dsl/custom-score-query/

Getting more out there, but likely more optimal, you could perhaps define
your own similarity model. Something to play around with at least:
http://www.elasticsearch.org/guide/reference/index-modules/similarity/

Best Regards,
Paul

On Sunday, September 22, 2013 1:56:35 PM UTC-4, Ban Mido wrote:

Hi ,

I have a field called tags which is an array of elements and i have
applied a flat analyzer over it.
Now i want a array of tags and i want to see which feed matches the array
of tags the most. The logic to similarity is that

  1. The feed with maximum number of matching tags should come first.
  2. In case if 2 feeds have same amount of matched tag , then the feed
    with the highest percentage of matched tag should come.

In short if my query is "terms" : { "tags" : [ "one" , "two" ] } ,
then the feed { "tags" : [ "one" , "two" , "three" , "four"]} should have
greater similarity score over the feed { "tags" : [ "one" , "two" , "three"
, "four" , "five" ] because for the latter feed the percentage of matched
tags is 75% and for former percentage of matched tags is 60%.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3