I have a field called tags which is an array of elements and i have applied
a flat analyzer over it.
Now i want a array of tags and i want to see which feed matches the array
of tags the most. The logic to similarity is that
The feed with maximum number of matching tags should come first.
In case if 2 feeds have same amount of matched tag , then the feed
with the highest percentage of matched tag should come.
In short if my query is "terms" : { "tags" : [ "one" , "two" ] } ,
then the feed { "tags" : [ "one" , "two" , "three" , "four"]} should have
greater similarity score over the feed { "tags" : [ "one" , "two" , "three"
, "four" , "five" ] because for the latter feed the percentage of matched
tags is 75% and for former percentage of matched tags is 60%.
Hi,
Doing a stock sort on score will get you most of the way there. However,
it will not strictly adhere to the first rule since it is TF/IDF based.
Implementing a custom score would definitely work:
Getting more out there, but likely more optimal, you could perhaps define
your own similarity model. Something to play around with at least:
Best Regards,
Paul
On Sunday, September 22, 2013 1:56:35 PM UTC-4, Ban Mido wrote:
Hi ,
I have a field called tags which is an array of elements and i have
applied a flat analyzer over it.
Now i want a array of tags and i want to see which feed matches the array
of tags the most. The logic to similarity is that
The feed with maximum number of matching tags should come first.
In case if 2 feeds have same amount of matched tag , then the feed
with the highest percentage of matched tag should come.
In short if my query is "terms" : { "tags" : [ "one" , "two" ] } ,
then the feed { "tags" : [ "one" , "two" , "three" , "four"]} should have
greater similarity score over the feed { "tags" : [ "one" , "two" , "three"
, "four" , "five" ] because for the latter feed the percentage of matched
tags is 75% and for former percentage of matched tags is 60%.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.