Scoring rules : Text based search


(Hiro Gangwani) #1

Dear Team,
We are doing indexing of document contents (Like word, PDF) in ES 0.90
version. Our application primarily uses text based search features from
document content. Our search criteria is as follows.

  1. All of keywords
  2. Any of keywords

We need to give highest priority to all of key words than any of keywords
search criteria. In other words based upon word count documents having all
of the key words should be shown first (High relevancy) then any of key
words. For this purpose we have applied highest boost factor to All of
keywords field than any of key words fields. Despite applying higher boost
factor we get results with document having less count of words with highest
score. Upon analyzing we found that size of document also plays role while
deciding the score. e.g document 1 (137KB) has 60 matching keywords and
document 2 (56kb) has 49 matching words. Despite this document 2 is shown
first having higher score than document 1. I think size of document matters
while deciding the relevancy and score.

Is there mechanism to assign higher score based upon count of word as
defined in search criteria so that same is shown as top? We are using Java
API to query from ES indexes.

Thanks,

Hiro.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b0c9b52-6452-4881-968d-30734c8b86b8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nikolay Chankov) #2

In Lucene one of the scoring criteria is the size of the field (in your
case the document) as shorter it is, the score is bigger, but the score
with multiple keywords rather than single one is strange. Are you sue that
the words are matching? Could you provide your search query?

On Friday, January 24, 2014 11:20:03 AM UTC, Hiro Gangwani wrote:

Dear Team,
We are doing indexing of document contents (Like word, PDF) in ES 0.90
version. Our application primarily uses text based search features from
document content. Our search criteria is as follows.

  1. All of keywords
  2. Any of keywords

We need to give highest priority to all of key words than any of keywords
search criteria. In other words based upon word count documents having all
of the key words should be shown first (High relevancy) then any of key
words. For this purpose we have applied highest boost factor to All of
keywords field than any of key words fields. Despite applying higher boost
factor we get results with document having less count of words with highest
score. Upon analyzing we found that size of document also plays role while
deciding the score. e.g document 1 (137KB) has 60 matching keywords and
document 2 (56kb) has 49 matching words. Despite this document 2 is shown
first having higher score than document 1. I think size of document matters
while deciding the relevancy and score.

Is there mechanism to assign higher score based upon count of word as
defined in search criteria so that same is shown as top? We are using Java
API to query from ES indexes.

Thanks,

Hiro.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c7fc5d7c-cf9b-41b9-abb8-697dc8ea1521%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3